Elastic Upgrading Index To Use Elser

Upgrading Index To Use Elser

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labsmodel-upgradeslangchainapplications

alph-notebooks/elasticsearch-labs / upgrading-index-to-use-elser.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Upgrade an index to use ELSER model

In this notebook we will see example on how to upgrade your index to ELSER model .elser_model_2 using Reindex API.

Note: Alternatively, you could also Update by query to update index in place to use ELSER. In this notebook, we will see examples on using Reindex API.

Scenerios that we will see in this notebook:

Migrating a index which hasn't generated text_expansion field to ELSER model .elser_model_2
Upgrade an existing index with .elser_model_1 to use .elser_model_2 model
Upgrade a index which use different model to use ELSER

Install and Connect

To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment. First we need to pip install the following packages:

elasticsearch

[ ]

Next, we will import all the modules that we need.

[3]

Now we will instantiate the Python Elasticsearch client. First we prompt for password and Cloud ID.

Then we create a client object that instantiates an instance of the Elasticsearch class.

[4]

Elastic Cloud ID:  ········
Elastic Api Key:  ········

{'name': 'instance-0000000001', 'cluster_name': 'ad402eb9a59041458b8edfc021e91caf', 'cluster_uuid': 'ks_HfcCdSf2qrcKZQsk9Lg', 'version': {'number': '8.11.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': 'd9ec3fa628c7b0ba3d25692e277ba26814820b20', 'build_date': '2023-11-04T10:04:57.184859352Z', 'build_snapshot': False, 'lucene_version': '9.8.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}

Download and Deploy ELSER v2 Model

Before we begin, we have to download and deploy ELSER model .elser_model_2.

Follow the instructions under the section Download and Deploy ELSER Model from the ELSER notebook

Case 1: Migrate an index with no `text_expansion` field

In this case we will see how to upgrade an index which has a ingestion pipeline configured, to use ELSER model elser_model_2

Create Ingestion pipeline with lowercase

We will create a simple pipeline to convert title field values to lowercase and use this ingestion pipeline on our index.

[5]

ObjectApiResponse({'acknowledged': True})

Create index - `movies` with mappings

Next, we will create a index with pipeline ingest-pipeline-lowercase that we created in previous step.

[6]

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'movies'})

Insert Documents

we are now ready to insert sample dataset of 12 movies to our index movies

[8]

Done indexing documents into `movies` index!

Upgrade index `movies` to use ELSER model

we are ready to re-index movies to a new index with the ELSER model .elser_model_2. As a first step, we have to create new ingestion pipeline and index to use ELSER model.

Create a new pipeline with ELSER

Let's create a new ingestion pipeline with ELSER model .elser_model_2.

[9]

ObjectApiResponse({'acknowledged': True})

Create a index with mappings

Next, create an index with required mappings for ELSER.

[13]

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-movies'})

Note:

plot_embedding is the name of the field that contains generated token with the type sparse_vector
plot is the name of the field from which the sparse_vector are created.

Reindex with updated pipeline

With the help of Reindex API, we can copy data from old index movies and to new index elser-movies with ingestion pipeline set to elser-ingest-pipeline . On success, the index elser-movies creates tokens on the text_expansion terms that you targeted for ELSER inference.

[15]

Once reindex is complete, inspect any document in the index elser-movies and notice that the document has a additional field plot_embedding with terms that we will be using in text_expansion query.

Querying documents with ELSER

Let's try a semantic search on our index with ELSER model .elser_model_2

[48]

Score: 6.403748
Title: se7en
Plot: Two detectives, a rookie and a veteran, hunt a serial killer who uses the seven deadly sins as his motives.

Score: 3.6703482
Title: the departed
Plot: An undercover cop and a mole in the police attempt to identify each other while infiltrating an Irish gang in South Boston.

Score: 2.9359207
Title: the usual suspects
Plot: A sole survivor tells of the twisty events leading up to a horrific gun battle on a boat, which began when five criminals met at a seemingly random police lineup.

Case 2: Upgrade index with ELSER model to `.elser_model_2`

If you already have a index with ELSER model .elser_model_1 and would like to upgrade to .elser_model_2, you can use the Reindex API with ingestion pipeline to use ELSER .elser_model_2 model.

Note: Before we begin, ensure that you are on Elasticsearch 8.11 version and ELSER model .elser_model_2 is deployed.

Create a new ingestion pipeline

We will create a pipeline with .elser_model_2 to enable us with reindexing.

[37]

ObjectApiResponse({'acknowledged': True})

Create a new index with mappings

We will create a new index with required mappings supporting ELSER

[38]

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-upgrade-index-demo'})

Use Reindex API

we will use Reindex API to move data from old index to new index elser-upgrade-index-demo. We will be excluding target field from old index and instead generate new tokens in the field plot_embedding with .elser_model_2 while reindexing.

Note: Make sure to replace my-index with your index name that you intend to upgrade and the field my-tokens-field with the field name that you have generated tokens previously.

[39]

Querying your data

Once reindexing is complete, you are ready to query on your data and perform semantic search

[40]

Score: 14.755971
Title: Python Crash Course
Plot: Python Crash Course

Score: 14.168372
Title: The Pragmatic Programmer: Your Journey to Mastery
Plot: The Pragmatic Programmer: Your Journey to Mastery

Score: 11.704832
Title: The Clean Coder: A Code of Conduct for Professional Programmers
Plot: The Clean Coder: A Code of Conduct for Professional Programmers

Case 3: Upgrade a index with different model to ELSER

Now we will see how to move your index which already has generated embedding using a different model.

Lets consider the index - books and has generated title_vector using the NLP model sentence-transformers__all-minilm-l6-v2. In case you would like know about more how to load a NLP model to an index, follow the steps from our notebook loading-model-from-hugging-face.ipynb

Follow similiar proceedure that we did in previously:

Create a ingestion pipeline with ELSER model .elser_model_2
Create a index with mappings, with the pipeline we created in the previous step.
Reindex, excluding the field that has embedding from the books index

Before we begin, lets take a look at our index books and see the mappings

[41]

ObjectApiResponse({'books': {'aliases': {}, 'mappings': {'properties': {'authors': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'num_reviews': {'type': 'long'}, 'publish_date': {'type': 'date'}, 'publisher': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'summary': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title': {'type': 'text', 'fields': {'keyword': {'type': 'keyword', 'ignore_above': 256}}}, 'title_vector': {'type': 'dense_vector', 'dims': 384, 'index': True, 'similarity': 'cosine'}}}, 'settings': {'index': {'routing': {'allocation': {'include': {'_tier_preference': 'data_content'}}}, 'number_of_shards': '1', 'provided_name': 'books', 'creation_date': '1706118077023', 'number_of_replicas': '1', 'uuid': 'GxGfG_LtSBOIXsB-5bF2_A', 'version': {'created': '8500003'}}}}})

Notice the field title_vector, We will exclude this field in our new index and generate new mapping against the field title from the books index

Create ingestion pipeline

Next, we will create a pipeline using ELSER model .elser_model_2

[ ]

[42]

ObjectApiResponse({'acknowledged': True})

Create index with mappings

Lets create a index elser-books with mappings

[43]

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'elser-books'})

Reindex API

we will use the Reindex API to copy data and generate text_expansion embedding to our new index elser-books.

[44]

Querying your data

Success! Now we can query data on the index elser-books.

[ ]

[47]

Score: 22.333044
Title: Python Crash Course
Score: 9.364547
Title: The Pragmatic Programmer: Your Journey to Mastery
Score: 8.410445
Title: Clean Code: A Handbook of Agile Software Craftsmanship

[ ]

Upgrading Index To Use Elser

Upgrade an index to use ELSER model

Install and Connect

Download and Deploy ELSER v2 Model

Case 1: Migrate an index with no text_expansion field

Create Ingestion pipeline with lowercase

Create index - movies with mappings

Insert Documents

Upgrade index movies to use ELSER model

Create a new pipeline with ELSER

Create a index with mappings

Reindex with updated pipeline

Querying documents with ELSER

Case 2: Upgrade index with ELSER model to .elser_model_2

Create a new ingestion pipeline

Create a new index with mappings

Use Reindex API

Querying your data

Case 3: Upgrade a index with different model to ELSER

Create ingestion pipeline

Create index with mappings

Reindex API

Querying your data

Case 1: Migrate an index with no `text_expansion` field

Create index - `movies` with mappings

Upgrade index `movies` to use ELSER model

Case 2: Upgrade index with ELSER model to `.elser_model_2`