Elastic Rag Elastic Llama3

Rag Elastic Llama3

openai-chatgptlangchain-pythonchatgptgenaielasticsearchelasticopenaiAIllama3integrationschatlogvectordatabasenotebooksPythonsearchgenaistackvectorelasticsearch-labslangchainapplications

alph-notebooks/elasticsearch-labs / rag-elastic-llama3.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

RAG with Elastic and Llama3 using Llamaindex

This interactive notebook uses Llamaindex to process fictional workplace documents and uses Llama3 running locally using Ollama to transform these documents into embeddings and store them into Elasticsearch. We then ask a question, retrieve the relevant documents from Elasticsearch and use Llama3 to provide a response.

Note : Llama3 is expected to be running using Ollama on the same machine where you will be running this notebook.

Requirements

For this example, you will need:

An Elastic deployment
- We'll be using Elastic Cloud for this example (available with a free trial)
- For LLM we will be using Ollama and Llama3 configured locally.

Use Elastic Cloud

If you don't have an Elastic Cloud deployment, follow these steps to create one.

Go to Elastic cloud Registration and sign up for a free trial
Select Create Deployment and follow the instructions

Install required dependencies for LlamaIndex and Elasticsearch

First we install the packages we need for this example.

[ ]

Import packages

Next we import the required packages as required. The imports are placed in the cells as required.

Prompt user to provide Cloud ID and API Key

We now prompt the user to provide us Cloud ID and API Key using getpass. We get these details from the deployment.

[ ]

Prepare documents for chunking and ingestion

We now prepare the data to be in the Document type for processing using Llamaindex

[6]

Define Elasticsearch and ingest pipeline in LlamaIndex for document processing. Use Llama3 for generating embeddings.

We now define the Elasticsearchstore with the required index name, the text field and its associated embeddings. We use Llama3 to generate the embeddings. We will be running Semantic search on the index to find documents relevant to the query posed by the user. We will use the SentenceSplitter provided by Llamaindex to chunk the documents. All this is run as part of an IngestionPipeline provided by the Llamaindex framework.

[7]

Execute pipeline

This will chunk the data, generate embeddings using Llama3 and ingest into Elasticsearch index, with embeddings in a dense vector field.

[ ]

The embeddings are stored in a dense vector field of dimension 4096. The dimension size comes from the size of the embeddings generated from Llama3.

Define LLM settings.

This connects to your local LLM. Please refer to https://ollama.com/library/llama3 for details on steps to run Llama3 locally.

If you have sufficient resources (atleast >64 GB Ram and GPU available) then you could try the 70B parameter version of Llama3

[9]

Setup Semantic search and integrate with Llama3.

We now configure Elasticsearch as the vector store for the Llamaindex query engine. The query engine, using Llama3 is then used to answer your questions with contextually relevant data from Elasticsearch.

[ ]

You could now try experimenting with other questions.