Arize AI Llamaindex Evals

Llamaindex Evals

arize-tutorialsevaluationLLMPython

alph-notebooks/arize-tutorials / llamaindex-evals.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Docs | GitHub | Slack Community

LLM Application Tracing & Evaluation Workflows

Exporting from Phoenix to Arize

This guide demonstrates how to use Arize for monitoring and debugging your LLM using Traces and Spans. We're going to use data from a chatbot built on top of Arize docs (https://docs.arize.com/arize/), with example query and retrieved text. Let's figure out how to understand how well our RAG system is working.

In this tutorial we will:

Build a RAG application using Llama-Index
Set up Phoenix as a trace collector for the Llama-Index application
Use Phoenix's evals library to compute LLM generated evaluations of our RAG app responses
Use arize SDK to export the traces and evaluations to Arize

You can read more about LLM tracing in Arize here.

Step 1: Install Dependencies 📚

Let's get the notebook setup with dependencies.

[ ]

Step 2: Set up Phoenix as a Trace Collector in our LLM app

To get started, launch the phoenix app. Make sure to open the app in your browser using the link below.

[ ]

Once you have started a Phoenix server, you can start your LlamaIndex application and configure it to send traces to Phoenix. To do this, you will have to add configure Phoenix as the global handler

[ ]

That's it! The Llama-Index application we build next will send traces to Phoenix.

Step 3: Build Your Llama Index RAG Application 📁

We start by setting your OpenAI API key if it is not already set as an environment variable.

[ ]

This example uses a RetrieverQueryEngine over a pre-built index of the Arize documentation, but you can use whatever LlamaIndex application you like. Download the pre-built index of the Arize docs from cloud storage and instantiate your storage context.

[ ]

We are now ready to instantiate our query engine that will perform retrieval-augmented generation (RAG). Query engine is a generic interface in LlamaIndex that allows you to ask question over your data. A query engine takes in a natural language query, and returns a rich response. It is built on top of Retrievers. You can compose multiple query engines to achieve more advanced capability.

[ ]

Let's test our app by asking a question about the Arize documentation:

[ ]

Great! Our application works!

Step 4: Use the instrumented Query Engine

We will download a dataset of questions for our RAG application to answer.

[ ]

We use the instrumented query engine and get responses from our RAG app.

[ ]

To see the questions and answers in phoenix, use the link described when we started the phoenix server

Step 5: Run Evaluations on the data in Phoenix

We will use the phoenix client to extract data in the correct format for specific evaluations and the custom evaluators, also from phoenix, to run evaluations on our RAG application.

[ ]

Next, we enable concurrent evaluations for better performance.

[ ]

Then, we define our evaluators and run the evaluations

[ ]

Finally, we log the evaluations into Phoenix

[ ]

Step 6: Export data to Arize

Step 6.a: Get data into dataframes

We extract the spans and evals dataframes from the phoenix client

[ ]

Step 6.b: Initialize arize client

[ ]

Lastly, we use log_spans from the arize client to log our spans data and, if we have evaluations, we can pass the optional evals_dataframe.

[ ]