Upgrading Index To Use Elser
Upgrade an index to use ELSER model
In this notebook we will see example on how to upgrade your index to ELSER model .elser_model_2 using Reindex API.
Note: Alternatively, you could also Update by query to update index in place to use ELSER. In this notebook, we will see examples on using Reindex API.
Scenerios that we will see in this notebook:
- Migrating a index which hasn't generated
text_expansionfield to ELSER model.elser_model_2 - Upgrade an existing index with
.elser_model_1to use.elser_model_2model - Upgrade a index which use different model to use ELSER
Install and Connect
To get started, we'll need to connect to our Elastic deployment using the Python client.
Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.
First we need to pip install the following packages:
elasticsearch
Next, we will import all the modules that we need.
Now we will instantiate the Python Elasticsearch client. First we prompt for password and Cloud ID.
Then we create a client object that instantiates an instance of the Elasticsearch class.
Elastic Cloud ID: ········ Elastic Api Key: ········
{'name': 'instance-0000000001', 'cluster_name': 'ad402eb9a59041458b8edfc021e91caf', 'cluster_uuid': 'ks_HfcCdSf2qrcKZQsk9Lg', 'version': {'number': '8.11.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'd9ec3fa628c7b0ba3d25692e277ba26814820b20', 'build_date': '2023-11-04T10:04:57.184859352Z', 'build_snapshot': False, 'lucene_version': '9.8.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}
Download and Deploy ELSER v2 Model
Before we begin, we have to download and deploy ELSER model .elser_model_2.
Follow the instructions under the section Download and Deploy ELSER Model from the ELSER notebook
Case 1: Migrate an index with no text_expansion field
In this case we will see how to upgrade an index which has a ingestion pipeline configured, to use ELSER model elser_model_2
Create Ingestion pipeline with lowercase
We will create a simple pipeline to convert title field values to lowercase and use this ingestion pipeline on our index.
ObjectApiResponse({'acknowledged': True}) Create index - movies with mappings
Next, we will create a index with pipeline ingest-pipeline-lowercase that we created in previous step.
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'}) Insert Documents
we are now ready to insert sample dataset of 12 movies to our index movies
Done indexing documents into `movies` index!
Upgrade index movies to use ELSER model
we are ready to re-index movies to a new index with the ELSER model .elser_model_2. As a first step, we have to create new ingestion pipeline and index to use ELSER model.
Create a new pipeline with ELSER
Let's create a new ingestion pipeline with ELSER model .elser_model_2.
ObjectApiResponse({'acknowledged': True}) Create a index with mappings
Next, create an index with required mappings for ELSER.
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-movies'}) Note:
plot_embeddingis the name of the field that contains generated token with the typesparse_vectorplotis the name of the field from which thesparse_vectorare created.
Reindex with updated pipeline
With the help of Reindex API, we can copy data from old index movies and to new index elser-movies with ingestion pipeline set to elser-ingest-pipeline . On success, the index elser-movies creates tokens on the text_expansion terms that you targeted for ELSER inference.
Once reindex is complete, inspect any document in the index elser-movies and notice that the document has a additional field plot_embedding with terms that we will be using in text_expansion query.
Querying documents with ELSER
Let's try a semantic search on our index with ELSER model .elser_model_2
Score: 6.403748 Title: se7en Plot: Two detectives, a rookie and a veteran, hunt a serial killer who uses the seven deadly sins as his motives. Score: 3.6703482 Title: the departed Plot: An undercover cop and a mole in the police attempt to identify each other while infiltrating an Irish gang in South Boston. Score: 2.9359207 Title: the usual suspects Plot: A sole survivor tells of the twisty events leading up to a horrific gun battle on a boat, which began when five criminals met at a seemingly random police lineup.
Case 2: Upgrade index with ELSER model to .elser_model_2
If you already have a index with ELSER model .elser_model_1 and would like to upgrade to .elser_model_2, you can use the Reindex API with ingestion pipeline to use ELSER .elser_model_2 model.
Note: Before we begin, ensure that you are on Elasticsearch 8.11 version and ELSER model .elser_model_2 is deployed.
Create a new ingestion pipeline
We will create a pipeline with .elser_model_2 to enable us with reindexing.
ObjectApiResponse({'acknowledged': True}) Create a new index with mappings
We will create a new index with required mappings supporting ELSER
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-upgrade-index-demo'}) Use Reindex API
we will use Reindex API to move data from old index to new index elser-upgrade-index-demo. We will be excluding target field from old index and instead generate new tokens in the field plot_embedding with .elser_model_2 while reindexing.
Note: Make sure to replace my-index with your index name that you intend to upgrade and the field my-tokens-field with the field name that you have generated tokens previously.
Querying your data
Once reindexing is complete, you are ready to query on your data and perform semantic search
Score: 14.755971 Title: Python Crash Course Plot: Python Crash Course Score: 14.168372 Title: The Pragmatic Programmer: Your Journey to Mastery Plot: The Pragmatic Programmer: Your Journey to Mastery Score: 11.704832 Title: The Clean Coder: A Code of Conduct for Professional Programmers Plot: The Clean Coder: A Code of Conduct for Professional Programmers
Case 3: Upgrade a index with different model to ELSER
Now we will see how to move your index which already has generated embedding using a different model.
Lets consider the index - books and has generated title_vector using the NLP model sentence-transformers__all-minilm-l6-v2. In case you would like know about more how to load a NLP model to an index, follow the steps from our notebook loading-model-from-hugging-face.ipynb
Follow similiar proceedure that we did in previously:
- Create a ingestion pipeline with ELSER model
.elser_model_2 - Create a index with mappings, with the pipeline we created in the previous step.
- Reindex, excluding the field that has embedding from the
booksindex
Before we begin, lets take a look at our index books and see the mappings
ObjectApiResponse({'books': {'aliases': {}, 'mappings': {'properties': {'authors': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'num_reviews': {'type': 'long'}, 'publish_date': {'type': 'date'}, 'publisher': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'summary': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title_vector': {'type': 'dense_vector', 'dims': 384, 'index': True, 'similarity': 'cosine'}}}, 'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}}, 'number_of_shards': '1', 'provided_name': 'books', 'creation_date': '1706118077023', 'number_of_replicas': '1', 'uuid': 'GxGfG_LtSBOIXsB-5bF2_A', 'version': {'created': '8500003'}}}}}) Notice the field title_vector, We will exclude this field in our new index and generate new mapping against the field title from the books index
Create ingestion pipeline
Next, we will create a pipeline using ELSER model .elser_model_2
ObjectApiResponse({'acknowledged': True}) Create index with mappings
Lets create a index elser-books with mappings
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-books'}) Reindex API
we will use the Reindex API to copy data and generate text_expansion embedding to our new index elser-books.
Querying your data
Success! Now we can query data on the index elser-books.
Score: 22.333044 Title: Python Crash Course Score: 9.364547 Title: The Pragmatic Programmer: Your Journey to Mastery Score: 8.410445 Title: Clean Code: A Handbook of Agile Software Craftsmanship