Rag Elastic Llama3 Elser
RAG with Elastic ELSER and Llama3 using Langchain
This interactive notebook uses Langchain to process fictional workplace documents and uses ELSER v2 running in Elasticsearch to transform these documents into embeddings and store them into Elasticsearch. We then ask a question, retrieve the relevant documents from Elasticsearch and use Llama3 running locally using Ollama to provide a response.
Note : Llama3 is expected to be running using Ollama on the same machine where you will be running this notebook.
Requirements
For this example, you will need:
- An Elastic deployment
- We'll be using Elastic Cloud for this example (available with a free trial)
- For LLM we will be using Ollama and Llama3 configured locally.
Use Elastic Cloud
If you don't have an Elastic Cloud deployment, follow these steps to create one.
- Go to Elastic cloud Registration and sign up for a free trial
- Select Create Deployment and follow the instructions
Install required dependencies
First we install the packages we need for this example.
Import packages
Next we import the required packages as required. The imports are placed in the cells as required.
Prompt user to provide Cloud ID and API Key
We now prompt the user to provide us Cloud ID and API Key using getpass. We get these details from the deployment.
Prepare documents for chunking and ingestion
We now prepare the data to be ingested into Elasticsearch. We use LangChain's RecursiveCharacterTextSplitter and split the documents' text at 512 characters with an overlap of 256 characters.
Define Elasticsearch Vector Store
We define ElasticsearchStore as the vector store with SparseVectorStrategy.SparseVectorStrategy converts each document into tokens and would be stored in vector field with datatype rank_features.
We will be using text embedding from ELSER v2 model .elser_model_2_linux-x86_64
Note: Before we begin indexing, ensure you have downloaded and deployed ELSER v2 model in your deployment and is running in ml node.
Add docs processed above.
The document has already been chunked. We do not use any specific embedding function here, since the tokens are inferred at index time and at query time within Elasticsearch.
This requires that the ELSER v2 model to be loaded and running in Elasticsearch.
LLM Configuration
This connects to your local LLM. Please refer to https://ollama.com/library/llama3 for details on steps to run Llama3 locally.
If you have sufficient resources (atleast >64 GB Ram and GPU available) then you could try the 70B parameter version of Llama3
Semantic Search using Elasticsearch ELSER v2 and Llama3
We will perform a semantic search on query with ELSER v2 as the model. The contextually relevant answer is then composed into a template along with the users original query.
We then user Llama3 to answer your questions with contextually relevant data fetched earlier from Elasticsearch using the retriever.
You could now try experimenting with other questions.