Rag Eval Deep Eval
RAG pipeline evaluation using DeepEval
DeepEval is a framework to evaluate Retrieval Augmented Generation (RAG) pipelines. It supports metrics like context relevance, answer correctness, faithfulness, and more.
For more information about evaluators, supported metrics and usage, check out:
This notebook shows how to use DeepEval-Haystack integration to evaluate a RAG pipeline against various metrics.
Prerequisites:
- OpenAI key
- DeepEval uses for computing some metrics, so we need an OpenAI key.
Enter OpenAI API key: ········
Install dependencies
Initialize the document store
1204
Build the RAG pipeline
<haystack.core.pipeline.pipeline.Pipeline object at 0x16a07b210> ,🚅 Components , - retriever: InMemoryBM25Retriever , - chat_prompt_builder: ChatPromptBuilder , - llm: OpenAIChatGenerator , - answer_builder: AnswerBuilder ,🛤️ Connections , - retriever.documents -> chat_prompt_builder.documents (List[Document]) , - retriever.documents -> answer_builder.documents (List[Document]) , - chat_prompt_builder.prompt -> llm.messages (List[ChatMessage]) , - llm.replies -> answer_builder.replies (List[ChatMessage])
Running the pipeline
Normandy is located in France.
We're done building our RAG pipeline. Let's evaluate it now!
Get questions, contexts, responses and ground truths for evaluation
For computing most metrics, we will need to provide the following to the evaluator:
- Questions
- Generated responses
- Retrieved contexts
- Ground truth (Specifically, this is needed for
context precision,context recallandanswer correctnessmetrics)
We'll start with random three questions from the dataset (see below) and now we'll get the matching contexts and responses for those questions.
Helper function to get context and responses for our questions
Ground truths, review all fields
Now that we have questions, contexts, and responses we'll also get the matching ground truth answers.
Questions: Which mountain range influenced the split of the regions? What is the prize offered for finding a solution to P=NP? Which Californio is located in the upper part?
Contexts: The state is most commonly divided and promoted by its regional tourism groups as consisting of northern, central, and southern California regions. The two AAA Auto Clubs of the state, the California State Automobile Association and the Automobile Club of Southern California, choose to simplify matters by dividing the state along the lines where their jurisdictions for membership apply, as either northern or southern California, in contrast to the three-region point of view. Another influence is the geographical phrase South of the Tehachapis, which would split the southern region off at the crest of that transverse range, but in that definition, the desert portions of north Los Angeles County and eastern Kern and San Bernardino Counties would be included in the southern California region due to their remoteness from the central valley and interior desert landscape. If a problem X is in C and hard for C, then X is said to be complete for C. This means that X is the hardest problem in C. (Since many problems could be equally hard, one might say that X is one of the hardest problems in C.) Thus the class of NP-complete problems contains the most difficult problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem P = NP is not solved, being able to reduce a known NP-complete problem, Π2, to another problem, Π1, would indicate that there is no known polynomial-time solution for Π1. This is because a polynomial-time solution to Π1 would yield a polynomial-time solution to Π2. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP. In the centre of Basel, the first major city in the course of the stream, is located the "Rhine knee"; this is a major bend, where the overall direction of the Rhine changes from West to North. Here the High Rhine ends. Legally, the Central Bridge is the boundary between High and Upper Rhine. The river now flows North as Upper Rhine through the Upper Rhine Plain, which is about 300 km long and up to 40 km wide. The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz. In Mainz, the Rhine leaves the Upper Rhine Valley and flows through the Mainz Basin.
Responses: The mountain range that influenced the split of the regions is the Tehachapi Mountains. The prize offered for finding a solution to P=NP is US$1,000,000. The provided information does not mention any Californios or any details related to California. A Californio refers to a Hispanic person who was born in California during the period when it was part of Mexico, before becoming part of the United States. Without more context or specific information about Californios, it is not possible to answer the question regarding which Californio is located in the upper part.
Ground truths: Tehachapis $1,000,000 Monterey
Evaluate the RAG pipeline
Now that we have the questions, contexts,responses and the ground truths, we can begin our pipeline evaluation and compute all the supported metrics.
Metrics computation
In addition to evaluating the final responses of the LLM, it is important that we also evaluate the individual components of the RAG pipeline as they can significantly impact the overall performance. Therefore, there are different metrics to evaluate the retriever, the generator and the overall pipeline. For a full list of available metrics and their expected inputs, check out the DeepEvalEvaluator Docs
The DeepEval documentation provides explanation of the individual metrics with simple examples for each of them.
Contextul Precision
The contextual precision metric measures our RAG pipeline's retriever by evaluating whether items in our contexts that are relevant to the given input are ranked higher than irrelevant ones.
====================================================================== Metrics Summary - ✅ Contextual Precision (score: 1.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 1.00 because the relevant node ranks first, providing a direct connection to the influence of the Tehachapi mountain range on regional divisions. The subsequent nodes, which discuss unrelated topics such as structural geology and climate, are ranked lower, ensuring that the relevant information is prioritized., error: None) For test case: - input: Which mountain range influenced the split of the regions? - actual output: The mountain range that influenced the split of the regions is the Tehachapi Mountains. - expected output: Tehachapis - context: None - retrieval context: ['The state is most commonly divided and promoted by its regional tourism groups as consisting of northern, central, and southern California regions. The two AAA Auto Clubs of the state, the California State Automobile Association and the Automobile Club of Southern California, choose to simplify matters by dividing the state along the lines where their jurisdictions for membership apply, as either northern or southern California, in contrast to the three-region point of view. Another influence is the geographical phrase South of the Tehachapis, which would split the southern region off at the crest of that transverse range, but in that definition, the desert portions of north Los Angeles County and eastern Kern and San Bernardino Counties would be included in the southern California region due to their remoteness from the central valley and interior desert landscape.', 'Among the most well-known experiments in structural geology are those involving orogenic wedges, which are zones in which mountains are built along convergent tectonic plate boundaries. In the analog versions of these experiments, horizontal layers of sand are pulled along a lower surface into a back stop, which results in realistic-looking patterns of faulting and the growth of a critically tapered (all angles remain the same) orogenic wedge. Numerical models work in the same way as these analog models, though they are often more sophisticated and can include patterns of erosion and uplift in the mountain belt. This helps to show the relationship between erosion and the shape of the mountain range. These studies can also give useful information about pathways for metamorphism through pressure, temperature, space, and time.', "The Mallee and upper Wimmera are Victoria's warmest regions with hot winds blowing from nearby semi-deserts. Average temperatures exceed 32 °C (90 °F) during summer and 15 °C (59 °F) in winter. Except at cool mountain elevations, the inland monthly temperatures are 2–7 °C (4–13 °F) warmer than around Melbourne (see chart). Victoria's highest maximum temperature since World War II, of 48.8 °C (119.8 °F) was recorded in Hopetoun on 7 February 2009, during the 2009 southeastern Australia heat wave."] ====================================================================== Overall Metric Pass Rates Contextual Precision: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Contextual Precision (score: 0.5, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.50 because while the second node provides a clear and direct answer regarding the prize for solving P=NP, the first and third nodes, ranked higher, do not address the question and instead focus on unrelated topics. This results in a mixed ranking where relevant information is overshadowed by irrelevant nodes., error: None) For test case: - input: What is the prize offered for finding a solution to P=NP? - actual output: The prize offered for finding a solution to P=NP is US$1,000,000. - expected output: $1,000,000 - context: None - retrieval context: ['If a problem X is in C and hard for C, then X is said to be complete for C. This means that X is the hardest problem in C. (Since many problems could be equally hard, one might say that X is one of the hardest problems in C.) Thus the class of NP-complete problems contains the most difficult problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem P = NP is not solved, being able to reduce a known NP-complete problem, Π2, to another problem, Π1, would indicate that there is no known polynomial-time solution for Π1. This is because a polynomial-time solution to Π1 would yield a polynomial-time solution to Π2. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.', 'The question of whether P equals NP is one of the most important open questions in theoretical computer science because of the wide implications of a solution. If the answer is yes, many important problems can be shown to have more efficient solutions. These include various types of integer programming problems in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems. The P versus NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute. There is a US$1,000,000 prize for resolving the problem.', 'What intractability means in practice is open to debate. Saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example, the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases. Similarly, algorithms can solve the NP-complete knapsack problem over a wide range of sizes in less than quadratic time and SAT solvers routinely handle large instances of the NP-complete Boolean satisfiability problem.'] ====================================================================== Overall Metric Pass Rates Contextual Precision: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Contextual Precision (score: 0.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.00 because all nodes in the retrieval contexts are irrelevant to the input question about Californios. The first node ranks highest but states, 'In the centre of Basel...' does not provide any information related to Californios or their locations, indicating a complete lack of relevance. Similarly, the second node discusses Roman frontiers, and the third node is about tectonic plates, both of which are unrelated to the query. As a result, there are no relevant nodes to elevate the score., error: None) For test case: - input: Which Californio is located in the upper part? - actual output: The provided information does not mention any Californios or any details related to California. A Californio refers to a Hispanic person who was born in California during the period when it was part of Mexico, before becoming part of the United States. Without more context or specific information about Californios, it is not possible to answer the question regarding which Californio is located in the upper part. - expected output: Monterey - context: None - retrieval context: ['In the centre of Basel, the first major city in the course of the stream, is located the "Rhine knee"; this is a major bend, where the overall direction of the Rhine changes from West to North. Here the High Rhine ends. Legally, the Central Bridge is the boundary between High and Upper Rhine. The river now flows North as Upper Rhine through the Upper Rhine Plain, which is about 300 km long and up to 40 km wide. The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz. In Mainz, the Rhine leaves the Upper Rhine Valley and flows through the Mainz Basin.', 'From the death of Augustus in AD 14 until after AD 70, Rome accepted as her Germanic frontier the water-boundary of the Rhine and upper Danube. Beyond these rivers she held only the fertile plain of Frankfurt, opposite the Roman border fortress of Moguntiacum (Mainz), the southernmost slopes of the Black Forest and a few scattered bridge-heads. The northern section of this frontier, where the Rhine is deep and broad, remained the Roman boundary until the empire fell. The southern part was different. The upper Rhine and upper Danube are easily crossed. The frontier which they form is inconveniently long, enclosing an acute-angled wedge of foreign territory between the modern Baden and Württemberg. The Germanic populations of these lands seem in Roman times to have been scanty, and Roman subjects from the modern Alsace-Lorraine had drifted across the river eastwards.', "In the 1960s, a series of discoveries, the most important of which was seafloor spreading, showed that the Earth's lithosphere, which includes the crust and rigid uppermost portion of the upper mantle, is separated into a number of tectonic plates that move across the plastically deforming, solid, upper mantle, which is called the asthenosphere. There is an intimate coupling between the movement of the plates on the surface and the convection of the mantle: oceanic plate motions and mantle convection currents always move in the same direction, because the oceanic lithosphere is the rigid upper thermal boundary layer of the convecting mantle. This coupling between rigid plates moving on the surface of the Earth and the convecting mantle is called plate tectonics."] ====================================================================== Overall Metric Pass Rates Contextual Precision: 100.00% pass rate ======================================================================
[[{'name': 'contextual_precision', 'score': 1.0, 'explanation': 'The score is 1.00 because the relevant node ranks first, providing a direct connection to the influence of the Tehachapi mountain range on regional divisions. The subsequent nodes, which discuss unrelated topics such as structural geology and climate, are ranked lower, ensuring that the relevant information is prioritized.'}], [{'name': 'contextual_precision', 'score': 0.5, 'explanation': 'The score is 0.50 because while the second node provides a clear and direct answer regarding the prize for solving P=NP, the first and third nodes, ranked higher, do not address the question and instead focus on unrelated topics. This results in a mixed ranking where relevant information is overshadowed by irrelevant nodes.'}], [{'name': 'contextual_precision', 'score': 0.0, 'explanation': "The score is 0.00 because all nodes in the retrieval contexts are irrelevant to the input question about Californios. The first node ranks highest but states, 'In the centre of Basel...' does not provide any information related to Californios or their locations, indicating a complete lack of relevance. Similarly, the second node discusses Roman frontiers, and the third node is about tectonic plates, both of which are unrelated to the query. As a result, there are no relevant nodes to elevate the score."}]]
Contextual Recall
Contextual recall measures the extent to which the contexts aligns with the ground truth.
====================================================================== Metrics Summary - ✅ Contextual Recall (score: 0.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.00 because the sentence 'Monterey' does not appear in any part of the node(s) in retrieval context., error: None) For test case: - input: Which Californio is located in the upper part? - actual output: The provided information does not mention any Californios or any details related to California. A Californio refers to a Hispanic person who was born in California during the period when it was part of Mexico, before becoming part of the United States. Without more context or specific information about Californios, it is not possible to answer the question regarding which Californio is located in the upper part. - expected output: Monterey - context: None - retrieval context: ['In the centre of Basel, the first major city in the course of the stream, is located the "Rhine knee"; this is a major bend, where the overall direction of the Rhine changes from West to North. Here the High Rhine ends. Legally, the Central Bridge is the boundary between High and Upper Rhine. The river now flows North as Upper Rhine through the Upper Rhine Plain, which is about 300 km long and up to 40 km wide. The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz. In Mainz, the Rhine leaves the Upper Rhine Valley and flows through the Mainz Basin.', 'From the death of Augustus in AD 14 until after AD 70, Rome accepted as her Germanic frontier the water-boundary of the Rhine and upper Danube. Beyond these rivers she held only the fertile plain of Frankfurt, opposite the Roman border fortress of Moguntiacum (Mainz), the southernmost slopes of the Black Forest and a few scattered bridge-heads. The northern section of this frontier, where the Rhine is deep and broad, remained the Roman boundary until the empire fell. The southern part was different. The upper Rhine and upper Danube are easily crossed. The frontier which they form is inconveniently long, enclosing an acute-angled wedge of foreign territory between the modern Baden and Württemberg. The Germanic populations of these lands seem in Roman times to have been scanty, and Roman subjects from the modern Alsace-Lorraine had drifted across the river eastwards.', "In the 1960s, a series of discoveries, the most important of which was seafloor spreading, showed that the Earth's lithosphere, which includes the crust and rigid uppermost portion of the upper mantle, is separated into a number of tectonic plates that move across the plastically deforming, solid, upper mantle, which is called the asthenosphere. There is an intimate coupling between the movement of the plates on the surface and the convection of the mantle: oceanic plate motions and mantle convection currents always move in the same direction, because the oceanic lithosphere is the rigid upper thermal boundary layer of the convecting mantle. This coupling between rigid plates moving on the surface of the Earth and the convecting mantle is called plate tectonics."] ====================================================================== Overall Metric Pass Rates Contextual Recall: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Contextual Recall (score: 1.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 1.00 because the term 'Tehachapis' is directly referenced in the retrieval context, making it perfectly aligned with the expected output., error: None) For test case: - input: Which mountain range influenced the split of the regions? - actual output: The mountain range that influenced the split of the regions is the Tehachapi Mountains. - expected output: Tehachapis - context: None - retrieval context: ['The state is most commonly divided and promoted by its regional tourism groups as consisting of northern, central, and southern California regions. The two AAA Auto Clubs of the state, the California State Automobile Association and the Automobile Club of Southern California, choose to simplify matters by dividing the state along the lines where their jurisdictions for membership apply, as either northern or southern California, in contrast to the three-region point of view. Another influence is the geographical phrase South of the Tehachapis, which would split the southern region off at the crest of that transverse range, but in that definition, the desert portions of north Los Angeles County and eastern Kern and San Bernardino Counties would be included in the southern California region due to their remoteness from the central valley and interior desert landscape.', 'Among the most well-known experiments in structural geology are those involving orogenic wedges, which are zones in which mountains are built along convergent tectonic plate boundaries. In the analog versions of these experiments, horizontal layers of sand are pulled along a lower surface into a back stop, which results in realistic-looking patterns of faulting and the growth of a critically tapered (all angles remain the same) orogenic wedge. Numerical models work in the same way as these analog models, though they are often more sophisticated and can include patterns of erosion and uplift in the mountain belt. This helps to show the relationship between erosion and the shape of the mountain range. These studies can also give useful information about pathways for metamorphism through pressure, temperature, space, and time.', "The Mallee and upper Wimmera are Victoria's warmest regions with hot winds blowing from nearby semi-deserts. Average temperatures exceed 32 °C (90 °F) during summer and 15 °C (59 °F) in winter. Except at cool mountain elevations, the inland monthly temperatures are 2–7 °C (4–13 °F) warmer than around Melbourne (see chart). Victoria's highest maximum temperature since World War II, of 48.8 °C (119.8 °F) was recorded in Hopetoun on 7 February 2009, during the 2009 southeastern Australia heat wave."] ====================================================================== Overall Metric Pass Rates Contextual Recall: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Contextual Recall (score: 1.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 1.00 because the sentence '$1,000,000' is directly supported by the 2nd node in retrieval context, which clearly states 'There is a US$1,000,000 prize for resolving the problem...'. This strong alignment confirms the accuracy of the expected output., error: None) For test case: - input: What is the prize offered for finding a solution to P=NP? - actual output: The prize offered for finding a solution to P=NP is US$1,000,000. - expected output: $1,000,000 - context: None - retrieval context: ['If a problem X is in C and hard for C, then X is said to be complete for C. This means that X is the hardest problem in C. (Since many problems could be equally hard, one might say that X is one of the hardest problems in C.) Thus the class of NP-complete problems contains the most difficult problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem P = NP is not solved, being able to reduce a known NP-complete problem, Π2, to another problem, Π1, would indicate that there is no known polynomial-time solution for Π1. This is because a polynomial-time solution to Π1 would yield a polynomial-time solution to Π2. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.', 'The question of whether P equals NP is one of the most important open questions in theoretical computer science because of the wide implications of a solution. If the answer is yes, many important problems can be shown to have more efficient solutions. These include various types of integer programming problems in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems. The P versus NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute. There is a US$1,000,000 prize for resolving the problem.', 'What intractability means in practice is open to debate. Saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example, the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases. Similarly, algorithms can solve the NP-complete knapsack problem over a wide range of sizes in less than quadratic time and SAT solvers routinely handle large instances of the NP-complete Boolean satisfiability problem.'] ====================================================================== Overall Metric Pass Rates Contextual Recall: 100.00% pass rate ======================================================================
[[{'name': 'contextual_recall', 'score': 0.0, 'explanation': "The score is 0.00 because the sentence 'Monterey' does not appear in any part of the node(s) in retrieval context."}], [{'name': 'contextual_recall', 'score': 1.0, 'explanation': "The score is 1.00 because the term 'Tehachapis' is directly referenced in the retrieval context, making it perfectly aligned with the expected output."}], [{'name': 'contextual_recall', 'score': 1.0, 'explanation': "The score is 1.00 because the sentence '$1,000,000' is directly supported by the 2nd node in retrieval context, which clearly states 'There is a US$1,000,000 prize for resolving the problem...'. This strong alignment confirms the accuracy of the expected output."}]]
Contextual Relevancy
The contextual relevancy metric measures the quality of our RAG pipeline's retriever by evaluating the overall relevance of the context for a given question.
====================================================================== Metrics Summary - ✅ Contextual Relevancy (score: 0.07692307692307693, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.08 because the retrieval context primarily discusses various unrelated topics, such as jurisdiction and experimental processes, which do not address the influence of a mountain range on regional splits. The only relevant statement mentions 'South of the Tehachapis' but lacks specificity about the mountain range's influence, making it insufficient to connect to the input question., error: None) For test case: - input: Which mountain range influenced the split of the regions? - actual output: The mountain range that influenced the split of the regions is the Tehachapi Mountains. - expected output: None - context: None - retrieval context: ['The state is most commonly divided and promoted by its regional tourism groups as consisting of northern, central, and southern California regions. The two AAA Auto Clubs of the state, the California State Automobile Association and the Automobile Club of Southern California, choose to simplify matters by dividing the state along the lines where their jurisdictions for membership apply, as either northern or southern California, in contrast to the three-region point of view. Another influence is the geographical phrase South of the Tehachapis, which would split the southern region off at the crest of that transverse range, but in that definition, the desert portions of north Los Angeles County and eastern Kern and San Bernardino Counties would be included in the southern California region due to their remoteness from the central valley and interior desert landscape.', 'Among the most well-known experiments in structural geology are those involving orogenic wedges, which are zones in which mountains are built along convergent tectonic plate boundaries. In the analog versions of these experiments, horizontal layers of sand are pulled along a lower surface into a back stop, which results in realistic-looking patterns of faulting and the growth of a critically tapered (all angles remain the same) orogenic wedge. Numerical models work in the same way as these analog models, though they are often more sophisticated and can include patterns of erosion and uplift in the mountain belt. This helps to show the relationship between erosion and the shape of the mountain range. These studies can also give useful information about pathways for metamorphism through pressure, temperature, space, and time.', "The Mallee and upper Wimmera are Victoria's warmest regions with hot winds blowing from nearby semi-deserts. Average temperatures exceed 32 °C (90 °F) during summer and 15 °C (59 °F) in winter. Except at cool mountain elevations, the inland monthly temperatures are 2–7 °C (4–13 °F) warmer than around Melbourne (see chart). Victoria's highest maximum temperature since World War II, of 48.8 °C (119.8 °F) was recorded in Hopetoun on 7 February 2009, during the 2009 southeastern Australia heat wave."] ====================================================================== Overall Metric Pass Rates Contextual Relevancy: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Contextual Relevancy (score: 0.14285714285714285, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.14 because the majority of the retrieval context focuses on theoretical aspects of complexity classes and implications of the P=NP problem, which do not address the prize. However, the relevant statements mention that 'The P versus NP problem is one of the Millennium Prize Problems' and 'There is a US$1,000,000 prize for resolving the problem,' which directly answer the input question., error: None) For test case: - input: What is the prize offered for finding a solution to P=NP? - actual output: The prize offered for finding a solution to P=NP is US$1,000,000. - expected output: None - context: None - retrieval context: ['If a problem X is in C and hard for C, then X is said to be complete for C. This means that X is the hardest problem in C. (Since many problems could be equally hard, one might say that X is one of the hardest problems in C.) Thus the class of NP-complete problems contains the most difficult problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem P = NP is not solved, being able to reduce a known NP-complete problem, Π2, to another problem, Π1, would indicate that there is no known polynomial-time solution for Π1. This is because a polynomial-time solution to Π1 would yield a polynomial-time solution to Π2. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.', 'The question of whether P equals NP is one of the most important open questions in theoretical computer science because of the wide implications of a solution. If the answer is yes, many important problems can be shown to have more efficient solutions. These include various types of integer programming problems in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems. The P versus NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute. There is a US$1,000,000 prize for resolving the problem.', 'What intractability means in practice is open to debate. Saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example, the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases. Similarly, algorithms can solve the NP-complete knapsack problem over a wide range of sizes in less than quadratic time and SAT solvers routinely handle large instances of the NP-complete Boolean satisfiability problem.'] ====================================================================== Overall Metric Pass Rates Contextual Relevancy: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Contextual Relevancy (score: 0.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.00 because there are no relevant statements in the retrieval context that address the question about 'Californio', and all provided statements discuss unrelated topics such as the Rhine and Roman territories., error: None) For test case: - input: Which Californio is located in the upper part? - actual output: The provided information does not mention any Californios or any details related to California. A Californio refers to a Hispanic person who was born in California during the period when it was part of Mexico, before becoming part of the United States. Without more context or specific information about Californios, it is not possible to answer the question regarding which Californio is located in the upper part. - expected output: None - context: None - retrieval context: ['In the centre of Basel, the first major city in the course of the stream, is located the "Rhine knee"; this is a major bend, where the overall direction of the Rhine changes from West to North. Here the High Rhine ends. Legally, the Central Bridge is the boundary between High and Upper Rhine. The river now flows North as Upper Rhine through the Upper Rhine Plain, which is about 300 km long and up to 40 km wide. The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz. In Mainz, the Rhine leaves the Upper Rhine Valley and flows through the Mainz Basin.', 'From the death of Augustus in AD 14 until after AD 70, Rome accepted as her Germanic frontier the water-boundary of the Rhine and upper Danube. Beyond these rivers she held only the fertile plain of Frankfurt, opposite the Roman border fortress of Moguntiacum (Mainz), the southernmost slopes of the Black Forest and a few scattered bridge-heads. The northern section of this frontier, where the Rhine is deep and broad, remained the Roman boundary until the empire fell. The southern part was different. The upper Rhine and upper Danube are easily crossed. The frontier which they form is inconveniently long, enclosing an acute-angled wedge of foreign territory between the modern Baden and Württemberg. The Germanic populations of these lands seem in Roman times to have been scanty, and Roman subjects from the modern Alsace-Lorraine had drifted across the river eastwards.', "In the 1960s, a series of discoveries, the most important of which was seafloor spreading, showed that the Earth's lithosphere, which includes the crust and rigid uppermost portion of the upper mantle, is separated into a number of tectonic plates that move across the plastically deforming, solid, upper mantle, which is called the asthenosphere. There is an intimate coupling between the movement of the plates on the surface and the convection of the mantle: oceanic plate motions and mantle convection currents always move in the same direction, because the oceanic lithosphere is the rigid upper thermal boundary layer of the convecting mantle. This coupling between rigid plates moving on the surface of the Earth and the convecting mantle is called plate tectonics."] ====================================================================== Overall Metric Pass Rates Contextual Relevancy: 100.00% pass rate ======================================================================
[[{'name': 'contextual_relevance', 'score': 0.07692307692307693, 'explanation': "The score is 0.08 because the retrieval context primarily discusses various unrelated topics, such as jurisdiction and experimental processes, which do not address the influence of a mountain range on regional splits. The only relevant statement mentions 'South of the Tehachapis' but lacks specificity about the mountain range's influence, making it insufficient to connect to the input question."}], [{'name': 'contextual_relevance', 'score': 0.14285714285714285, 'explanation': "The score is 0.14 because the majority of the retrieval context focuses on theoretical aspects of complexity classes and implications of the P=NP problem, which do not address the prize. However, the relevant statements mention that 'The P versus NP problem is one of the Millennium Prize Problems' and 'There is a US$1,000,000 prize for resolving the problem,' which directly answer the input question."}], [{'name': 'contextual_relevance', 'score': 0.0, 'explanation': "The score is 0.00 because there are no relevant statements in the retrieval context that address the question about 'Californio', and all provided statements discuss unrelated topics such as the Rhine and Roman territories."}]]
Answer relevancy
The answer relevancy metric measures the quality of our RAG pipeline's response by evaluating how relevant the response is compared to the provided question.
====================================================================== Metrics Summary - ✅ Answer Relevancy (score: 1.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 1.00 because the response directly addressed the question about the prize for solving P=NP without including any irrelevant statements., error: None) For test case: - input: What is the prize offered for finding a solution to P=NP? - actual output: The prize offered for finding a solution to P=NP is US$1,000,000. - expected output: None - context: None - retrieval context: ['If a problem X is in C and hard for C, then X is said to be complete for C. This means that X is the hardest problem in C. (Since many problems could be equally hard, one might say that X is one of the hardest problems in C.) Thus the class of NP-complete problems contains the most difficult problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem P = NP is not solved, being able to reduce a known NP-complete problem, Π2, to another problem, Π1, would indicate that there is no known polynomial-time solution for Π1. This is because a polynomial-time solution to Π1 would yield a polynomial-time solution to Π2. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.', 'The question of whether P equals NP is one of the most important open questions in theoretical computer science because of the wide implications of a solution. If the answer is yes, many important problems can be shown to have more efficient solutions. These include various types of integer programming problems in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems. The P versus NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute. There is a US$1,000,000 prize for resolving the problem.', 'What intractability means in practice is open to debate. Saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example, the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases. Similarly, algorithms can solve the NP-complete knapsack problem over a wide range of sizes in less than quadratic time and SAT solvers routinely handle large instances of the NP-complete Boolean satisfiability problem.'] ====================================================================== Overall Metric Pass Rates Answer Relevancy: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Answer Relevancy (score: 1.0, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 1.00 because the response directly addressed the question about the mountain range influencing the split of the regions without any irrelevant statements., error: None) For test case: - input: Which mountain range influenced the split of the regions? - actual output: The mountain range that influenced the split of the regions is the Tehachapi Mountains. - expected output: None - context: None - retrieval context: ['The state is most commonly divided and promoted by its regional tourism groups as consisting of northern, central, and southern California regions. The two AAA Auto Clubs of the state, the California State Automobile Association and the Automobile Club of Southern California, choose to simplify matters by dividing the state along the lines where their jurisdictions for membership apply, as either northern or southern California, in contrast to the three-region point of view. Another influence is the geographical phrase South of the Tehachapis, which would split the southern region off at the crest of that transverse range, but in that definition, the desert portions of north Los Angeles County and eastern Kern and San Bernardino Counties would be included in the southern California region due to their remoteness from the central valley and interior desert landscape.', 'Among the most well-known experiments in structural geology are those involving orogenic wedges, which are zones in which mountains are built along convergent tectonic plate boundaries. In the analog versions of these experiments, horizontal layers of sand are pulled along a lower surface into a back stop, which results in realistic-looking patterns of faulting and the growth of a critically tapered (all angles remain the same) orogenic wedge. Numerical models work in the same way as these analog models, though they are often more sophisticated and can include patterns of erosion and uplift in the mountain belt. This helps to show the relationship between erosion and the shape of the mountain range. These studies can also give useful information about pathways for metamorphism through pressure, temperature, space, and time.', "The Mallee and upper Wimmera are Victoria's warmest regions with hot winds blowing from nearby semi-deserts. Average temperatures exceed 32 °C (90 °F) during summer and 15 °C (59 °F) in winter. Except at cool mountain elevations, the inland monthly temperatures are 2–7 °C (4–13 °F) warmer than around Melbourne (see chart). Victoria's highest maximum temperature since World War II, of 48.8 °C (119.8 °F) was recorded in Hopetoun on 7 February 2009, during the 2009 southeastern Australia heat wave."] ====================================================================== Overall Metric Pass Rates Answer Relevancy: 100.00% pass rate ====================================================================== ====================================================================== Metrics Summary - ✅ Answer Relevancy (score: 0.6, threshold: 0.0, strict: False, evaluation model: gpt-4o-mini, reason: The score is 0.60 because the output included irrelevant statements that did not address the question about the location of Californios, indicating a lack of specific information. However, some relevant content was present, which is why the score is not lower., error: None) For test case: - input: Which Californio is located in the upper part? - actual output: The provided information does not mention any Californios or any details related to California. A Californio refers to a Hispanic person who was born in California during the period when it was part of Mexico, before becoming part of the United States. Without more context or specific information about Californios, it is not possible to answer the question regarding which Californio is located in the upper part. - expected output: None - context: None - retrieval context: ['In the centre of Basel, the first major city in the course of the stream, is located the "Rhine knee"; this is a major bend, where the overall direction of the Rhine changes from West to North. Here the High Rhine ends. Legally, the Central Bridge is the boundary between High and Upper Rhine. The river now flows North as Upper Rhine through the Upper Rhine Plain, which is about 300 km long and up to 40 km wide. The most important tributaries in this area are the Ill below of Strasbourg, the Neckar in Mannheim and the Main across from Mainz. In Mainz, the Rhine leaves the Upper Rhine Valley and flows through the Mainz Basin.', 'From the death of Augustus in AD 14 until after AD 70, Rome accepted as her Germanic frontier the water-boundary of the Rhine and upper Danube. Beyond these rivers she held only the fertile plain of Frankfurt, opposite the Roman border fortress of Moguntiacum (Mainz), the southernmost slopes of the Black Forest and a few scattered bridge-heads. The northern section of this frontier, where the Rhine is deep and broad, remained the Roman boundary until the empire fell. The southern part was different. The upper Rhine and upper Danube are easily crossed. The frontier which they form is inconveniently long, enclosing an acute-angled wedge of foreign territory between the modern Baden and Württemberg. The Germanic populations of these lands seem in Roman times to have been scanty, and Roman subjects from the modern Alsace-Lorraine had drifted across the river eastwards.', "In the 1960s, a series of discoveries, the most important of which was seafloor spreading, showed that the Earth's lithosphere, which includes the crust and rigid uppermost portion of the upper mantle, is separated into a number of tectonic plates that move across the plastically deforming, solid, upper mantle, which is called the asthenosphere. There is an intimate coupling between the movement of the plates on the surface and the convection of the mantle: oceanic plate motions and mantle convection currents always move in the same direction, because the oceanic lithosphere is the rigid upper thermal boundary layer of the convecting mantle. This coupling between rigid plates moving on the surface of the Earth and the convecting mantle is called plate tectonics."] ====================================================================== Overall Metric Pass Rates Answer Relevancy: 100.00% pass rate ======================================================================
[[{'name': 'answer_relevancy', 'score': 1.0, 'explanation': 'The score is 1.00 because the response directly addressed the question about the prize for solving P=NP without including any irrelevant statements.'}], [{'name': 'answer_relevancy', 'score': 1.0, 'explanation': 'The score is 1.00 because the response directly addressed the question about the mountain range influencing the split of the regions without any irrelevant statements.'}], [{'name': 'answer_relevancy', 'score': 0.6, 'explanation': 'The score is 0.60 because the output included irrelevant statements that did not address the question about the location of Californios, indicating a lack of specific information. However, some relevant content was present, which is why the score is not lower.'}]]
Faithfulness
The faithfulness metric measures the quality of our RAG pipeline's responses by evaluating whether the response factually aligns with the contents of context we provided.
Our pipeline evaluation using DeepEval is now complete!
Haystack Useful Sources