Datasets

observabilityllmsgenaicookbookprompt-managementhacktoberfestlarge-language-modelsnextraLangfuselangfuse-docs

description: End-to-end example of creating a dataset, adding items, and running experiments with Langfuse datasets. category: Datasets

Langfuse Datasets Cookbook

In this cookbook, we'll iterate on systems prompts with the goal of getting only the capital of a given country. We use Langfuse datasets, to store a list of example inputs and expected outputs.

This is a very simple example, you can run experiments on any LLM application that you either trace with the Langfuse SDKs (Python, JS/TS) or via one of our integrations (e.g. Langchain).

Simple example application

  • Model: gpt-4o
  • Input: country name
  • Output: capital
  • Evaluation: exact match of completion and ground truth
  • Experiment on: system prompt

Setup

[ ]
[ ]

With the environment variables set, we can now initialize the Langfuse client. get_client() initializes the Langfuse client using the credentials provided in the environment variables.

[3]
Langfuse client is authenticated and ready!

Create a dataset

[4]

Items

Load local items into the Langfuse dataset. Alternatively you can add items from production via the Langfuse UI.

[5]
[6]

Example using Langfuse @observe() decorator

Application

This an example production application that we want to evaluate. It is instrumented with the Langfuse Decorator. We do not need to change the application code to evaluate it subsequently.

[11]

Experiment runner

This is a simple experiment runner that runs the application on each item in the dataset and evaluates the output.

[12]
[13]

Run experiments

Now we can easily run experiments with different configurations to explore which yields the best results.

[7]

Finished processing dataset 'capital_cities' for run 'famous_city'.

Finished processing dataset 'capital_cities' for run 'directly_ask'.

Finished processing dataset 'capital_cities' for run 'asking_specifically'.

Finished processing dataset 'capital_cities' for run 'asking_specifically_2nd_try'.

Example using Langchain

[ ]
[ ]
[ ]