Rag Gemma Huggingface Elastic
RAG: Using Gemma LLM locally for question answering on private data
In this notebook, our aim is to develop a RAG system utilizing Google's Gemma model. We'll generate vectors with Elastic's ELSER model and store them in Elasticsearch. Additionally, we'll explore semantic retrieval techniques and present the top search results as a context window to the Gemma model. Furthermore, we'll utilize the Hugging Face transformer library to load Gemma on a local environment.
Setup
Elastic Credentials - Create an Elastic Cloud deployment to get all Elastic credentials (ELASTIC_CLOUD_ID, ELASTIC_API_KEY).
Hugging Face Token - To get started with the Gemma model, it is necessary to agree to the terms on Hugging Face and generate the access token with write role.
Gemma Model - We're going to use gemma-2b-it, though Google has released 4 open models. You can use any of them i.e. gemma-2b, gemma-7b, gemma-7b-it
Install packages
Import packages
Get Credentials
Add documents
Let's download the sample dataset and deserialize the document.
Split Documents into Passages
Index Documents into Elasticsearch using ELSER
Before we begin indexing, ensure you have downloaded and deployed the ELSER model in your deployment and is running on the ML node.
Hugging Face login
Initialize the tokenizer with the model (google/gemma-2b-it)
Create a text-generation pipeline and initialize with LLM
Format Docs
Create a chain using Prompt template
Ask question
'Answer: The sales goals are to increase revenue, expand market share, and strengthen customer relationships in our target markets.'