Notebooks
E
Elastic
Chatbot With Bm25 Only Example

Chatbot With Bm25 Only Example

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIchatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainself-query-retriever-examplesapplications

BM25 and Self-querying retriever with elasticsearch and LangChain

Open In Colab

This workbook demonstrates example of Elasticsearch's Self-query retriever to convert unstructured query into a structured query and we use this for a BM25 example.

In this example:

  • we are going to ingest a sample dataset of movies outside of LangChain
  • Customise the retrieval strategy in ElasticsearchStore to use just BM25
  • use the self-query retrieval to transform question into a structured query
  • Use the documents and RAG strategy to answer the question

Install packages

[58]

[notice] A new release of pip is available: 23.2 -> 23.3.1
[notice] To update, run: pip install --upgrade pip

Sample Dataset

[59]

Connect to Elasticsearch

ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up here for a free trial.

We'll use the Cloud ID to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.

We will use ElasticsearchStore to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step.

[60]

Indexing data into Elasticsearch

We have chosen to index the data outside of Langchain to demonstrate how its possible to use Langchain for RAG and use the self-query retrieveral on any Elasticsearch index.

[61]

Setup query retriever

Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document.

We will then instantiate retriever with SelfQueryRetriever.from_llm

[62]

BM25 Only Retriever

One option is to customise the query to use BM25 only retrieval method. We can do this by overriding the custom_query function, specifying the query to use only multi_match.

In the example below, the self-query retriever is using the LLM to transform the question into a keyword and filter query (query: dreams, filter: year range). The custom query is then used to perform a BM25 based query on the keyword query and filter query.

This means that you dont have to vectorise all the documents if you want to perform a question / answerinf use-case on an existing Elasticsearch index.

[63]
query {'query': {'bool': {'filter': [{'bool': {'must': [{'match': {'metadata.genre': {'query': 'science fiction'}}}, {'range': {'metadata.year': {'gt': 1992}}}, {'range': {'metadata.year': {'lt': 2007}}}]}}], 'must': [{'multi_match': {'query': 'dinosaur', 'fields': ['text'], 'fuzziness': 'AUTO'}}]}}}
docs: [Document(page_content='A bunch of scientists bring back dinosaurs and mayhem breaks loose', metadata={'year': 1993, 'rating': 7.7, 'genre': 'science fiction', 'director': 'Steven Spielberg', 'title': 'Jurassic Park'})]
'Steven Spielberg directed Jurassic Park in 1993.'
[64]
ObjectApiResponse({'acknowledged': True})