Retrievers Intro Notebook
Introduction to Retrievers Supporting Notebook
This notebook allows you to run the exampls from the Search Labs blog - Introducing Retrievers - Search All the Things!
In this notebook you will:
- Download IMDB dataset from Kaggle
- Create a new Elasticsearch Serverless Search Project
- Create two inference services
- Deploy ELSER
- Deploy e5-small
- Create ingest pipeline
- Create mapping
- Ingest the IMDB data, creating embedding as part of ingest
- Scale down models for query load
- Run example retrievers
Setup
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 477.5/477.5 kB 3.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.3/64.3 kB 2.6 MB/s eta 0:00:00
Data Set Download
Dataset URL: https://www.kaggle.com/datasets/ashpalsingh1525/imdb-movies-dataset License(s): Community Data License Agreement - Permissive - Version 1.0 Downloading imdb-movies-dataset.zip to /content 0% 0.00/2.84M [00:00<?, ?B/s] 100% 2.84M/2.84M [00:00<00:00, 117MB/s]
Create Elasticsearch Serverless project
Create Trial Account (if you don't already have an Elastic Cloud account)
Create an Elastic Cloud API Key
Follow the steps in the guide here
When you create the key, ensure you select "Admin" access level
Copy the key someplace safe, you will use it in the next cell
Elasticsearch Setup
When you run the cell below you will be prompted to enter your Cloud API Key
Enter your Elastic Cloud API key: ··········
201 - {"alias":"retrievers-demo-d1ff7a","cloud_id":"Retrievers_Demo:dXMtZWFzdC0xLmF3cy5lbGFzdGljLmNsb3VkJGQxZmY3YWZmMWE5YTQ1ODZhNDBkMzk1ZjZlMGJhMDk3LmVzJGQxZmY3YWZmMWE5YTQ1ODZhNDBkMzk1ZjZlMGJhMDk3Lmti","id":"d1ff7aff1a9a4586a40d395f6e0ba097","metadata":{"created_at":"2024-05-22T00:57:10.550790086Z","created_by":"3953873479","organization_id":"3953873479"},"name":"Retrievers Demo","region_id":"aws-us-east-1","endpoints":{"elasticsearch":"https://retrievers-demo-d1ff7a.es.us-east-1.aws.elastic.cloud","kibana":"https://retrievers-demo-d1ff7a.kb.us-east-1.aws.elastic.cloud"},"optimized_for":"vector","search_lake":{"boost_window":0,"search_power":5},"type":"elasticsearch","credentials":{"password":"p4Y1hd8j4qF5gl5E5d0S5Ya5","username":"admin"}}
Set project connection credentials
Create Elasticsearch connection
project created
Deploy Elser and e5
The two blocks below will deploy the embedding models and auto-scale ML capacity
Deploy and start ELSER
Deploy and start e5-small
Check model deployment state
This will loop checking until both ELSER and e5 have been fully deployed
This can take a couple minutes if additional capacity needs to be allocated to run the models
.multilingual-e5-small_linux-x86_64 model deployed and started .elser_model_2_linux-x86_64 model deployed and started
List Inference Endpoints
Create index template and link to ingest pipeline
ObjectApiResponse({'acknowledged': True}) Create ingest pipeline
ObjectApiResponse({'acknowledged': True}) Ingest Docs
This will
- Do a bit of pre-processing
- Bulk ingest the 10,178 IMDB records
- Generate sparse vector embedings using the ELSER model for
overviewandnamesfields - Generate dense vector embedings using the ELSER model for
overviewandnamesfields
It generally takes around ~2 minutes to complete with the above allocation settings
The function took 180.07549405097961 seconds to run
Scale down ELSER and e5 models
We don't need a large number of model allocations for test querying so we will scale each down to 1 allocation
Retriever tests
We are going to search the overview field (either the text or embedding) in the dataset for movies using the search input clueless slackers
Feel free to change the movie_search variable below to something else
Standard - Search All the Text! - bm25
Beavis and Butt-Head Do America - Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted. Mr. Popper's Penguins - Jim Carrey stars as Tom Popper, a successful businessman who’s clueless when it comes to the really important things in life...until he inherits six “adorable” penguins, each with its own unique personality. Soon Tom’s rambunctious roommates turn his swank New York apartment into a snowy winter wonderland — and the rest of his world upside-down. Spaceballs - When the nefarious Dark Helmet hatches a plan to snatch Princess Vespa and steal her planet's air, space-bum-for-hire Lone Starr and his clueless sidekick fly to the rescue. Along the way, they meet Yogurt, who puts Lone Starr wise to the power of "The Schwartz." Can he master it in time to save the day?
kNN - Search all the Dense Vectors!
Beavis and Butt-Head Do America - Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted. Uncharted - A young street-smart, Nathan Drake and his wisecracking partner Victor “Sully” Sullivan embark on a dangerous pursuit of “the greatest treasure never found” while also tracking clues that may lead to Nathan’s long-lost brother. Crystal Skulls - A millionaire philanthropist collects the famous Crystal Skulls trying to tap into their ancient powers. It is up to a team lead by a college professor whose father disappeared searching for the 13th skull to save the world when the first 12 skulls are united and reek havoc on the earth without the control of the 13th skull.
text_expansion - Search all the Sparse Vectors! - elser
Bill & Ted's Bogus Journey - Amiable slackers Bill and Ted are once again roped into a fantastical adventure when De Nomolos, a villain from the future, sends evil robot duplicates of the two lads to terminate and replace them. The robot doubles actually succeed in killing Bill and Ted, but the two are determined to escape the afterlife, challenging the Grim Reaper to a series of games in order to return to the land of the living. Beavis and Butt-Head Do America - Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted. Knocked Up - A slacker and a career-driven woman accidentally conceive a child after a one-night stand. As they try to make the relationship work, they must navigate the challenges of parenthood and their differences in lifestyle and maturity.
rrf - Combine All the Things!
Beavis and Butt-Head Do America - Slacker duo Beavis and Butt-Head wake to discover their TV has been stolen. Their search for a new one takes them on a clueless adventure across America, during which they manage to accidentally become America's most wanted. Bill & Ted's Bogus Journey - Amiable slackers Bill and Ted are once again roped into a fantastical adventure when De Nomolos, a villain from the future, sends evil robot duplicates of the two lads to terminate and replace them. The robot doubles actually succeed in killing Bill and Ted, but the two are determined to escape the afterlife, challenging the Grim Reaper to a series of games in order to return to the land of the living. Uncharted - A young street-smart, Nathan Drake and his wisecracking partner Victor “Sully” Sullivan embark on a dangerous pursuit of “the greatest treasure never found” while also tracking clues that may lead to Nathan’s long-lost brother.