Langchain Self Query Retriever
Self-querying retriever with elasticsearch and langchain
This workbook demonstrates example of Elasticsearch's Self-query retriever to convert unstructured query into a structured query and apply structured query to a vectorstore.
Before we begin, we first split the documents into chunks with langchain and then using ElasticsearchStore.from_documents, we create a vectorstore and index data to elasticsearch.
We will then see few examples query demonstrating full power of elasticsearch powered self-query retriever.
Install packages and import modules
[notice] A new release of pip is available: 23.2 -> 23.3.1 [notice] To update, run: pip install --upgrade pip
Create documents
Next, we will create list of documents with summary of movies using langchain Schema Document, containing each document's page_content and metadata .
Connect to Elasticsearch
ℹ️ We're using an Elastic Cloud deployment of Elasticsearch for this notebook. If you don't have an Elastic Cloud deployment, sign up here for a free trial.
We'll use the Cloud ID to identify our deployment, because we are using Elastic Cloud deployment. To find the Cloud ID for your deployment, go to https://cloud.elastic.co/deployments and select your deployment.
We will use ElasticsearchStore to connect to our elastic cloud deployment, This would help create and index data easily. We would also send list of documents that we created in the previous step.
Setup query retriever
Next we will instantiate self-query retriever by providing a bit information about our document attributes and a short description about the document.
We will then instantiate retriever with SelfQueryRetriever.from_llm
Test retriever with simple query
We will test the retriever with a simple query: What are some movies about dream.
The output shows all the relevant documents to the query.
[Document(page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...', metadata={'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.2}),
, Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6}),
, Document(page_content='Toys come alive and have a blast doing so', metadata={'year': 1995, 'genre': 'animated'}),
, Document(page_content='A bunch of normal-sized women are supremely wholesome and some men pine after them', metadata={'year': 2019, 'director': 'Greta Gerwig', 'rating': 8.3})] Test retriever with simple query and filter
We will now test the retriever with a query: Has Andrei Tarkovsky directed any science fiction movies.
This query has a filter on the metadata genre and director.
[Document(page_content='Three men walk into the Zone, three men walk out of the Zone', metadata={'year': 1979, 'rating': 9.9, 'director': 'Andrei Tarkovsky', 'genre': 'science fiction'})] Instantiate retriever to filter k documents
We will now instantiate retriever again to fetch k number of documents. We can do this my setting enable_limit=True when instantiating the retriever.
We will then test retriever to filter k documents.
Test the retriever to filter k documents
We will now test the retriever with a query: what are two movies about dream.
The output would show exactly 2 documents.
[Document(page_content='Leo DiCaprio gets lost in a dream within a dream within a dream within a ...', metadata={'year': 2010, 'director': 'Christopher Nolan', 'rating': 8.2}),
, Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})] Test retriever for complex queries
We will try some complex queries with filters and 1 limit.
Query: Show that one movie which was about dream and was released after the year 1992 but before 2007?.
[Document(page_content='A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea', metadata={'year': 2006, 'director': 'Satoshi Kon', 'rating': 8.6})]