02 Hybrid Search
Hybrid Search using RRF
In this example we'll use the reciprocal rank fusion algorithm to combine the results of BM25 and kNN semantic search. We'll use the same dataset we used in our quickstart guide.
You can use RRF for hybrid search out of the box, without any additional configuration. This example demonstrates how RRF ranking works at a basic level.
Install packages and initialize the Elasticsearch Python client
To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.
First we need to pip install the packages we need for this example.
Next we need to import the elasticsearch module and the getpass module.
getpass is part of the Python standard library and is used to securely prompt for credentials.
Now we can instantiate the Python Elasticsearch client. First we prompt the user for their password and Cloud ID.
🔐 NOTE: getpass enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.
Then we create a client object that instantiates an instance of the Elasticsearch class.
Enable Telemetry
Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!
Test the Client
Before you continue, confirm that the client has connected with this test.
{'name': 'instance-0000000011', 'cluster_name': 'd1bd36862ce54c7b903e2aacd4cd7f0a', 'cluster_uuid': 'tIkh0X_UQKmMFQKSfUw-VQ', 'version': {'number': '8.9.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '8aa461beb06aa0417a231c345a1b8c38fb498a0d', 'build_date': '2023-07-19T14:43:58.555259655Z', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}
Refer to https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect to a self-managed deployment.
Read https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/connecting.html#connect-self-managed-new to learn how to connect using API keys.
Pretty printing Elasticsearch responses
Let's add a helper function to print Elasticsearch responses in a readable format. This function is similar to the one that was used in the quickstart guide.
Querying Documents with Hybrid Search
🔐 NOTE: Before you can run the query in this section, you need the book_index dataset from our quick start. If you haven't worked through the quick start, please follow the steps described there to create an Elasticsearch deployment with the dataset in it, and then come back to run the query here.
Now we need to perform a query using two different search strategies:
- Semantic search using the "all-MiniLM-L6-v2" embedding model
- Keyword search using the "title" field
We then use Reciprocal Rank Fusion (RRF) to balance the scores to provide a final list of documents, ranked in order of relevance. RRF is a ranking algorithm for combining results from different information retrieval strategies.
Note: With the retriever API, _score contains the document’s relevance score, and the rank is simply the position in the results (first result is rank 1, etc.).
ID: IAOa7osBiUNHLMdf3q2r Publication date: 2019-05-03 Title: Python Crash Course Summary: A fast-paced, no-nonsense guide to programming in Python Rank: 1 Score: 0.032786883 ID: HwOa7osBiUNHLMdf3q2r Publication date: 2019-10-29 Title: The Pragmatic Programmer: Your Journey to Mastery Summary: A guide to pragmatic programming for software engineers and developers Rank: 2 Score: 0.03175403 ID: JAOa7osBiUNHLMdf3q2r Publication date: 2018-12-04 Title: Eloquent JavaScript Summary: A modern introduction to programming Rank: 3 Score: 0.016129032 ID: IwOa7osBiUNHLMdf3q2r Publication date: 2015-03-27 Title: You Don't Know JS: Up & Going Summary: Introduction to JavaScript and programming as a whole Rank: 4 Score: 0.015873017 ID: KAOa7osBiUNHLMdf3q2r Publication date: 2012-06-27 Title: Introduction to the Theory of Computation Summary: Introduction to the theory of computation and complexity theory Rank: 5 Score: 0.015873017