Evaluating RAG With RAGAs
Evaluating RAG with RAGAs using GPT-4o
Ragas is a framework for evaluating Retrieval Augmented Generation (RAG) pipelines.
Ragas provides you with the tools/metrics based on the latest research for evaluating LLM-generated text to give you insights about your RAG pipeline. Ragas can be integrated with your CI/CD to provide continuous checks to ensure performance.
GPT4-o is used as an LLM to generate responses out of semantically close context chunks.
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 27.4/27.4 MB 40.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 30.5/30.5 MB 28.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 157.5/157.5 kB 10.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.1/71.1 kB 4.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 480.6/480.6 kB 23.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.4/2.4 MB 66.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.0/1.0 MB 37.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 409.5/409.5 kB 24.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3.1/3.1 MB 66.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 50.6/50.6 kB 3.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.2/1.2 MB 38.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 8.2 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 179.3/179.3 kB 8.4 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 10.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 12.0 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.5/49.5 kB 3.0 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.
Setup OPENAI_API_KEY as an environment variable
Load .txt file and convert them into chunks
WARNING:langchain_text_splitters.base:Created a chunk of size 215, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 232, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 242, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 219, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 304, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 205, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 332, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 215, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 203, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 281, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 201, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 250, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 325, which is longer than the specified 200 WARNING:langchain_text_splitters.base:Created a chunk of size 242, which is longer than the specified 200
Setup Retriever
Retriever utilizes LanceDB for scalable vector search and advanced retrieval in RAG, delivering blazing fast performance for searching large sets of embeddings.
Setup RAG Pipeline with Prompt template
<ipython-input-9-fe6c5e5e17b4>:7: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 1.0. An updated version of the class exists in the :class:`~langchain-openai package and should be used instead. To use it run `pip install -U :class:`~langchain-openai` and import as `from :class:`~langchain_openai import ChatOpenAI``. llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
Sample Questions with their Expected Answers
Define a set of questions with their answers for creating dataset including ground truth, generated answers with their context using which they are generated.
<ipython-input-10-d89a36ccef76>:20: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 1.0. Use :meth:`~invoke` instead. [docs.page_content for docs in retriever.get_relevant_documents(query)]
RAGA Evaluation Pipeline
Simple pipeline of RAGA for evaluation with the listed metrics to understand and evaluate the RAG system.
Metrics on which we will evaulate are answer_correctness, faithfulness, answer_similarity, context_precision, context_utilization, context_recall, context_relevancy, answer_relevancy, and context_entity_recall
Evaluating: 0%| | 0/21 [00:00<?, ?it/s]