Image Similarity
How to implement Image search using Elasticsearch
The workbook shows how to implement an Image search using Elasticsearch. You will index documents with image embeddings (generated or pre-generated) and then using NLP model be able to search using natural language description of the image.
Prerequisities
Before we begin, create an elastic cloud deployment and autoscale to have least one machine learning (ML) node with enough (4GB) memory. Also ensure that the Elasticsearch cluster is running.
If you don't already have an Elastic deployment, you can sign up for a free Elastic Cloud trial.
Install Python requirements
Before you start you need to install all required Python dependencies.
Upload NLP model for querying
Using the eland_import_hub_model script, download and install the clip-ViT-B-32-multilingual-v1 model, will transfer your search query into vector which will be used for the search over the set of images stored in Elasticsearch.
To get your cloud id, go to Elastic cloud and On the deployment overview page, copy down the Cloud ID.
To authenticate your request, You could use API key. Alternatively, you can use your cloud deployment username and password.
Connect to Elasticsearch cluster
Use your own cluster details ELASTIC_CLOUD_ID, API_KEY.
ObjectApiResponse({'name': 'instance-0000000001', 'cluster_name': 'a72482be54904952ba46d53c3def7740', 'cluster_uuid': 'g8BE52TtT32pGBbRzP_oKA', 'version': {'number': '8.12.2', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '48a287ab9497e852de30327444b0809e55d46466', 'build_date': '2024-02-19T10:04:32.774273190Z', 'build_snapshot': False, 'lucene_version': '9.9.2', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}) Create Index and mappings for Images
Befor you can index documents into Elasticsearch, you need to create an Index with correct mappings.
Creating index images
/var/folders/b0/0h5fbhnd0tz563nl779m3jv80000gn/T/ipykernel_57417/1485784368.py:45: DeprecationWarning: Passing transport options in the API method is deprecated. Use 'Elasticsearch.options()' instead. es.indices.create(
ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'images'}) Get image dataset and embeddings
Download:
- The example image dataset is from Unsplash
- The Image embeddings are pre-generated using CLIP model
Then unzip both files.
Import all pregenerated image embeddings
In this section you will import ~19k documents worth of pregenenerated image embeddings with metadata.
The process downloads files with images information, merge them and index into Elasticsearch.
Indexed 1000 documents Indexed 2000 documents Indexed 3000 documents Indexed 4000 documents Indexed 5000 documents Indexed 6000 documents Indexed 7000 documents Indexed 8000 documents Indexed 9000 documents Indexed 10000 documents Indexed 11000 documents Indexed 12000 documents Indexed 13000 documents Indexed 14000 documents Indexed 15000 documents Indexed 16000 documents Indexed 17000 documents Indexed 18000 documents Indexed 19000 documents Indexed 19833 image embeddings documents
Query the image dataset
The next step is to run a query to search for images. The example query searches for "model_text": "Valentine day flowers" using the model sentence-transformers__clip-vit-b-32-multilingual-v1 that we uploaded to Elasticsearch earlier.
The process is carried out with a single query, even though internaly it consists of two tasks. One is to tramsform your search text into a vector using the NLP model and the second task is to run the vector search over the image dataset.
POST images/_search
{
"knn": {
"field": "image_embedding",
"k": 5,
"num_candidates": 10,
"query_vector_builder": {
"text_embedding": {
"model_id": "sentence-transformers__clip-vit-b-32-multilingual-v1",
"model_text": "Valentine day flowers"
}
}
},
"fields": [
"photo_description",
"ai_description",
"photo_url"
],
"_source": false
}
[Optional] Simple streamlit UI
In the following section, you will view the response in a simple UI for better visualisation.
The query in the previous step did write down a file response json_data.json for the UI to load and visualise.
Follow the steps below to see the results in a table.
Install tunnel library
Create application
Run app
Run the application and check your IP for the tunneling
Create the tunnel
Run the tunnel and use the link below to connect to the tunnel.
Use the IP from the previous step to connect to the application
npx: installed 22 in 2.186s your url is: https://nine-facts-act.loca.lt ^C