01 Colpali

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIcolpalichatlogvectordatabasePythonsearchgenaistacksupporting-blog-contentvectorelasticsearch-labslangchainapplications

This notebook shows how to ingest and search images using ColPali with Elasticsearch. Read our accompanying blog post on ColPali in Elasticsearch for more context on this notebook.

We will be using images from the ViDoRe benchmark as example data.

The URL and API key for your Elasticsearch cluster are expected in a file elastic.env in this format:

	ELASTIC_HOST=<cluster-url>
ELASTIC_API_KEY=<api-key>

[1]

First we load the sample data from huggingface and save it to disk.

[2]
Saving images to disk:   0%|          | 0/500 [00:00<?, ?it/s]

Here we load the ColPali model and define functions to generate vectors from images and text.

[3]
Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

This is where we are going over all our images and creating our multi-vectors with the ColPali model.

[4]
Create ColPali Vectors:   0%|          | 0/500 [00:00<?, ?it/s]
Saved 500 vector entries to disk

This is the new rank_vectors field type, where we will be saving our ColPali vectors.

[5]
[INFO] Index 'searchlabs-colpali' already exists.

Load all images back from disk, create the vectors for them and index them into Elasticsearch.

[6]
Index documents:   0%|          | 0/500 [00:00<?, ?it/s]

Use the new maxSimDotProduct function to calculate the similarity between our query and the image vectors in Elasticsearch.

[7]
[ ]