Configuring Chunking Settings For Inference Endpoints
Configuring Chunking Settings For Inference Endpoints
Learn how to configure chunking settings for Inference API endpoints.
🧰 Requirements
For this example, you will need:
-
An Elastic deployment:
- We'll be using Elastic Cloud for this example (available with a free trial)
-
Elasticsearch 8.16 or above.
-
Python 3.7 or above.
Create Elastic Cloud deployment or serverless project
If you don't have an Elastic Cloud deployment, sign up here for a free trial.
Install packages and connect with Elasticsearch Client
To get started, we'll need to connect to our Elastic deployment using the Python client (version 8.12.0 or above). Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.
First we need to pip install the following packages:
elasticsearch
Next, we need to import the modules we need. 🔐 NOTE: getpass enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.
Now we can instantiate the Python Elasticsearch client.
First we prompt the user for their password and Cloud ID.
Then we create a client object that instantiates an instance of the Elasticsearch class.
Test the Client
Before you continue, confirm that the client has connected with this test.
Refer to the documentation to learn how to connect to a self-managed deployment.
Read this page to learn how to connect using API keys.
Create the inference endpoint object
Let's create the inference endpoint by using the Create Inference API.
In this example, you'll be creating an inference endpoint for the ELSER integration which will deploy Elastic's ELSER model within your cluster. Chunking settings are configurable for any inference endpoint with an embedding task type. A full list of available integrations can be found in the Create Inference API documentation.
To configure chunking settings, the request body must contain a chunking_settings map with a strategy value along with any required values for the selected chunking strategy. For this example, you'll be configuring chunking settings for a sentence strategy with a maximum chunk size of 25 words and 1 sentence overlap between chunks. For more information on available chunking strategies and their configurable values, see the chunking strategies documentation.
Ingest a document
Now let's ingest a document into the index created in the previous step.
Note: It may take some time Elasticsearch to allocate nodes to the ELSER model deployment that is started when creating the inference endpoint. You will need to wait until the deployment is allocated to a node before the request below can succeed.
View the chunks
The generated chunks and their corresponding inference results can be seen stored in the document in the index under the key chunks within the _inference_fields metafield. The chunks are stored as a list of character offset values. Let's see the chunks generated when ingesting the documenting in the previous step.
Conclusion
You've now learned how to configure chunking settings for an inference endpoint! For more information about configurable chunking, see the configuring chunking documentation.