Notebooks
E
Elastic
02 Hybrid Search

02 Hybrid Search

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

Hybrid Search using RRF

Open In Colab

In this example we'll use the reciprocal rank fusion algorithm to combine the results of BM25 and kNN semantic search. We'll use the same dataset we used in our quickstart guide.

You can use RRF for hybrid search out of the box, without any additional configuration. This example demonstrates how RRF ranking works at a basic level.

Install packages and initialize the Elasticsearch Python client

To get started, we'll need to connect to our Elastic deployment using the Python client. Because we're using an Elastic Cloud deployment, we'll use the Cloud ID to identify our deployment.

First we need to pip install the packages we need for this example.

[ ]

Next we need to import the elasticsearch module and the getpass module. getpass is part of the Python standard library and is used to securely prompt for credentials.

[2]

Now we can instantiate the Python Elasticsearch client. First we prompt the user for their password and Cloud ID.

🔐 NOTE: getpass enables us to securely prompt the user for credentials without echoing them to the terminal, or storing it in memory.

Then we create a client object that instantiates an instance of the Elasticsearch class.

[3]

Enable Telemetry

Knowing that you are using this notebook helps us decide where to invest our efforts to improve our products. We would like to ask you that you run the following code to let us gather anonymous usage statistics. See telemetry.py for details. Thank you!

[ ]

Test the Client

Before you continue, confirm that the client has connected with this test.

[4]
{'name': 'instance-0000000011', 'cluster_name': 'd1bd36862ce54c7b903e2aacd4cd7f0a', 'cluster_uuid': 'tIkh0X_UQKmMFQKSfUw-VQ', 'version': {'number': '8.9.0', 'build_flavor': 'default', 'build_type': 'docker', 'build_hash': '8aa461beb06aa0417a231c345a1b8c38fb498a0d', 'build_date': '2023-07-19T14:43:58.555259655Z', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}

Pretty printing Elasticsearch responses

Let's add a helper function to print Elasticsearch responses in a readable format. This function is similar to the one that was used in the quickstart guide.

[ ]

Querying Documents with Hybrid Search

🔐 NOTE: Before you can run the query in this section, you need the book_index dataset from our quick start. If you haven't worked through the quick start, please follow the steps described there to create an Elasticsearch deployment with the dataset in it, and then come back to run the query here.

Now we need to perform a query using two different search strategies:

  • Semantic search using the "all-MiniLM-L6-v2" embedding model
  • Keyword search using the "title" field

We then use Reciprocal Rank Fusion (RRF) to balance the scores to provide a final list of documents, ranked in order of relevance. RRF is a ranking algorithm for combining results from different information retrieval strategies.

Note: With the retriever API, _score contains the document’s relevance score, and the rank is simply the position in the results (first result is rank 1, etc.).

[ ]

ID: IAOa7osBiUNHLMdf3q2r
Publication date: 2019-05-03
Title: Python Crash Course
Summary: A fast-paced, no-nonsense guide to programming in Python
Rank: 1
Score: 0.032786883

ID: HwOa7osBiUNHLMdf3q2r
Publication date: 2019-10-29
Title: The Pragmatic Programmer: Your Journey to Mastery
Summary: A guide to pragmatic programming for software engineers and developers
Rank: 2
Score: 0.03175403

ID: JAOa7osBiUNHLMdf3q2r
Publication date: 2018-12-04
Title: Eloquent JavaScript
Summary: A modern introduction to programming
Rank: 3
Score: 0.016129032

ID: IwOa7osBiUNHLMdf3q2r
Publication date: 2015-03-27
Title: You Don't Know JS: Up & Going
Summary: Introduction to JavaScript and programming as a whole
Rank: 4
Score: 0.015873017

ID: KAOa7osBiUNHLMdf3q2r
Publication date: 2012-06-27
Title: Introduction to the Theory of Computation
Summary: Introduction to the theory of computation and complexity theory
Rank: 5
Score: 0.015873017