Elastic Image Similarity

Image Similarity

imagesopenai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

alph-notebooks/elasticsearch-labs / image-similarity.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

How to implement Image search using Elasticsearch

The workbook shows how to implement an Image search using Elasticsearch. You will index documents with image embeddings (generated or pre-generated) and then using NLP model be able to search using natural language description of the image.

Prerequisities

Before we begin, create an elastic cloud deployment and autoscale to have least one machine learning (ML) node with enough (4GB) memory. Also ensure that the Elasticsearch cluster is running.

If you don't already have an Elastic deployment, you can sign up for a free Elastic Cloud trial.

Install Python requirements

Before you start you need to install all required Python dependencies.

[ ]

[3]

Upload NLP model for querying

Using the eland_import_hub_model script, download and install the clip-ViT-B-32-multilingual-v1 model, will transfer your search query into vector which will be used for the search over the set of images stored in Elasticsearch.

To get your cloud id, go to Elastic cloud and On the deployment overview page, copy down the Cloud ID.

To authenticate your request, You could use API key. Alternatively, you can use your cloud deployment username and password.

[ ]

Connect to Elasticsearch cluster

Use your own cluster details ELASTIC_CLOUD_ID, API_KEY.

[9]

ObjectApiResponse({'name': 'instance-0000000001', 'cluster_name': 'a72482be54904952ba46d53c3def7740', 'cluster_uuid': 'g8BE52TtT32pGBbRzP_oKA', 'version': {'number': '8.12.2', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '48a287ab9497e852de30327444b0809e55d46466', 'build_date': '2024-02-19T10:04:32.774273190Z', 'build_snapshot': False, 'lucene_version': '9.9.2', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'})

Create Index and mappings for Images

Befor you can index documents into Elasticsearch, you need to create an Index with correct mappings.

[10]

Creating index images

/var/folders/b0/0h5fbhnd0tz563nl779m3jv80000gn/T/ipykernel_57417/1485784368.py:45: DeprecationWarning: Passing transport options in the API method is deprecated. Use 'Elasticsearch.options()' instead.
  es.indices.create(

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'images'})

Get image dataset and embeddings

Download:

The example image dataset is from Unsplash
The Image embeddings are pre-generated using CLIP model

Then unzip both files.

[ ]

Import all pregenerated image embeddings

In this section you will import ~19k documents worth of pregenenerated image embeddings with metadata.

The process downloads files with images information, merge them and index into Elasticsearch.

[20]

Indexed 1000 documents
Indexed 2000 documents
Indexed 3000 documents
Indexed 4000 documents
Indexed 5000 documents
Indexed 6000 documents
Indexed 7000 documents
Indexed 8000 documents
Indexed 9000 documents
Indexed 10000 documents
Indexed 11000 documents
Indexed 12000 documents
Indexed 13000 documents
Indexed 14000 documents
Indexed 15000 documents
Indexed 16000 documents
Indexed 17000 documents
Indexed 18000 documents
Indexed 19000 documents
Indexed 19833 image embeddings documents

Query the image dataset

The next step is to run a query to search for images. The example query searches for "model_text": "Valentine day flowers" using the model sentence-transformers__clip-vit-b-32-multilingual-v1 that we uploaded to Elasticsearch earlier.

The process is carried out with a single query, even though internaly it consists of two tasks. One is to tramsform your search text into a vector using the NLP model and the second task is to run the vector search over the image dataset.

	POST images/_search
{
  "knn": {
    "field": "image_embedding",
    "k": 5,
    "num_candidates": 10,
    "query_vector_builder": {
      "text_embedding": {
        "model_id": "sentence-transformers__clip-vit-b-32-multilingual-v1",
        "model_text": "Valentine day flowers"
      }
    }
  },
  "fields": [
    "photo_description",
    "ai_description",
    "photo_url"
  ],
  "_source": false
}

[ ]

[Optional] Simple streamlit UI

In the following section, you will view the response in a simple UI for better visualisation.

The query in the previous step did write down a file response json_data.json for the UI to load and visualise.

Follow the steps below to see the results in a table.

Install tunnel library

[ ]

Create application

[ ]

Run app

Run the application and check your IP for the tunneling

[ ]

Create the tunnel

Run the tunnel and use the link below to connect to the tunnel.

Use the IP from the previous step to connect to the application

[38]

npx: installed 22 in 2.186s
your url is: https://nine-facts-act.loca.lt
^C

Resources

Blog: https://www.elastic.co/blog/implement-image-similarity-search-elastic

GH : https://github.com/radoondas/flask-elastic-image-search