How To Use Jina V2 Embeddings
Introduction
In this notebook, we will extend the Jina Late Chunking implementation example to index the documents and embeddings to Elasticsearch, and run queries against those documents.
The Jina part of the implementation will be keep untouched.
This is supporting material for the following blog post: https://www.elastic.co/search-labs/blog/how-to-use-jina-v2-embeddings
Late Chunking
This notebooks explains how the "Late Chunking" can be implemented. First you need to install the requirements:
Then we load a model which we want to use for the embedding. We choose jinaai/jina-embeddings-v2-base-en but any other model which supports mean pooling is possible. However, models with a large maximum context-length are preferred.
Now we define the text which we want to encode and split it into chunks. The chunk_by_sentences function also returns the span annotations.
Those specify the number of tokens per chunk which is needed for the chunked pooling.
Now let's try to segement a toy example.
Now we encode the chunks with the traditional and the context-sensitive late_chunking method:
Finally, we compare the similarity of the word "Berlin" with the chunks. The similarity should be higher for the context-sensitive chunked pooling method:
Indexing to Elasticsearch
Now, let's index the brand new embeddings to Elasticsearch and run queries