Notebooks
E
Elastic
Retrievers Intro Notebook

Retrievers Intro Notebook

introducing-retrieversopenai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasePythonsearchgenaistacksupporting-blog-contentvectorelasticsearch-labslangchainapplications

Introduction to Retrievers Supporting Notebook

This notebook allows you to run the exampls from the Search Labs blog - Introducing Retrievers - Search All the Things!

In this notebook you will:

  • Download IMDB dataset from Kaggle
  • Create a new Elasticsearch Serverless Search Project
  • Create two inference services
  • Deploy ELSER
  • Deploy e5-small
  • Create ingest pipeline
  • Create mapping
  • Ingest the IMDB data, creating embedding as part of ingest
  • Scale down models for query load
  • Run example retrievers

Setup

[1]
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 477.5/477.5 kB 3.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.3/64.3 kB 2.6 MB/s eta 0:00:00
[2]

Data Set Download

[3]
Dataset URL: https://www.kaggle.com/datasets/ashpalsingh1525/imdb-movies-dataset
License(s): Community Data License Agreement - Permissive - Version 1.0
Downloading imdb-movies-dataset.zip to /content
  0% 0.00/2.84M [00:00<?, ?B/s]
100% 2.84M/2.84M [00:00<00:00, 117MB/s]
[4]

Create Elasticsearch Serverless project

Create Trial Account (if you don't already have an Elastic Cloud account)

Create an Elastic Cloud API Key

Follow the steps in the guide here

When you create the key, ensure you select "Admin" access level

Copy the key someplace safe, you will use it in the next cell

Elasticsearch Setup

When you run the cell below you will be prompted to enter your Cloud API Key

[5]
Enter your Elastic Cloud API key: ··········
[6]
201 - {"alias":"retrievers-demo-d1ff7a","cloud_id":"Retrievers_Demo:dXMtZWFzdC0xLmF3cy5lbGFzdGljLmNsb3VkJGQxZmY3YWZmMWE5YTQ1ODZhNDBkMzk1ZjZlMGJhMDk3LmVzJGQxZmY3YWZmMWE5YTQ1ODZhNDBkMzk1ZjZlMGJhMDk3Lmti","id":"d1ff7aff1a9a4586a40d395f6e0ba097","metadata":{"created_at":"2024-05-22T00:57:10.550790086Z","created_by":"3953873479","organization_id":"3953873479"},"name":"Retrievers Demo","region_id":"aws-us-east-1","endpoints":{"elasticsearch":"https://retrievers-demo-d1ff7a.es.us-east-1.aws.elastic.cloud","kibana":"https://retrievers-demo-d1ff7a.kb.us-east-1.aws.elastic.cloud"},"optimized_for":"vector","search_lake":{"boost_window":0,"search_power":5},"type":"elasticsearch","credentials":{"password":"p4Y1hd8j4qF5gl5E5d0S5Ya5","username":"admin"}}

Set project connection credentials

[7]

Create Elasticsearch connection

[8]
project created

Deploy Elser and e5

The two blocks below will deploy the embedding models and auto-scale ML capacity

Deploy and start ELSER

[9]

Deploy and start e5-small

[10]

Check model deployment state

This will loop checking until both ELSER and e5 have been fully deployed

This can take a couple minutes if additional capacity needs to be allocated to run the models

[11]
.multilingual-e5-small_linux-x86_64 model deployed and started
.elser_model_2_linux-x86_64 model deployed and started

List Inference Endpoints

Create index template and link to ingest pipeline

[12]
ObjectApiResponse({'acknowledged': True})

Create ingest pipeline

[13]
ObjectApiResponse({'acknowledged': True})

Ingest Docs

This will

  • Do a bit of pre-processing
  • Bulk ingest the 10,178 IMDB records
  • Generate sparse vector embedings using the ELSER model for overview and names fields
  • Generate dense vector embedings using the ELSER model for overview and names fields

It generally takes around ~2 minutes to complete with the above allocation settings

[14]
The function took 180.07549405097961 seconds to run

Scale down ELSER and e5 models

We don't need a large number of model allocations for test querying so we will scale each down to 1 allocation

[15]

Retriever tests

We are going to search the overview field (either the text or embedding) in the dataset for movies using the search input clueless slackers

Feel free to change the movie_search variable below to something else

[16]

Standard - Search All the Text! - bm25

[17]
Beavis and Butt-Head Do America
- Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted.

Mr. Popper's Penguins
- Jim Carrey stars as Tom Popper, a successful businessman who’s clueless when it comes to the really important things in life...until he inherits six “adorable” penguins, each with its own unique personality. Soon Tom’s rambunctious roommates turn his swank New York apartment into a snowy winter wonderland — and the rest of his world upside-down.

Spaceballs
- When the nefarious Dark Helmet hatches a plan to snatch Princess Vespa and steal her planet's air, space-bum-for-hire Lone Starr and his clueless sidekick fly to the rescue. Along the way, they meet Yogurt, who puts Lone Starr wise to the power of "The Schwartz." Can he master it in time to save the day?

kNN - Search all the Dense Vectors!

[18]
Beavis and Butt-Head Do America
- Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted.

Uncharted
- A young street-smart, Nathan Drake and his wisecracking partner Victor “Sully” Sullivan embark on a dangerous pursuit of “the greatest treasure never found” while also tracking clues that may lead to Nathan’s long-lost brother.

Crystal Skulls
- A millionaire philanthropist collects the famous Crystal Skulls trying to tap into their ancient powers. It is up to a team lead by a college professor whose father disappeared searching for the 13th skull to save the world when the first 12 skulls are united and reek havoc on the earth without the control of the 13th skull.

text_expansion - Search all the Sparse Vectors! - elser

[19]
Bill & Ted's Bogus Journey
- Amiable slackers Bill and Ted are once again roped into a fantastical adventure when De Nomolos, a villain from the future, sends evil robot duplicates of the two lads to terminate and replace them. The robot doubles actually succeed in killing Bill and Ted, but the two are determined to escape the afterlife, challenging the Grim Reaper to a series of games in order to return to the land of the living.

Beavis and Butt-Head Do America
- Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted.

Knocked Up
- A slacker and a career-driven woman accidentally conceive a child after a one-night stand. As they try to make the relationship work, they must navigate the challenges of parenthood and their differences in lifestyle and maturity.

rrf - Combine All the Things!

[20]
Beavis and Butt-Head Do America
- Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted.

Bill & Ted's Bogus Journey
- Amiable slackers Bill and Ted are once again roped into a fantastical adventure when De Nomolos, a villain from the future, sends evil robot duplicates of the two lads to terminate and replace them. The robot doubles actually succeed in killing Bill and Ted, but the two are determined to escape the afterlife, challenging the Grim Reaper to a series of games in order to return to the land of the living.

Uncharted
- A young street-smart, Nathan Drake and his wisecracking partner Victor “Sully” Sullivan embark on a dangerous pursuit of “the greatest treasure never found” while also tracking clues that may lead to Nathan’s long-lost brother.