Datasets
description: End-to-end example of creating a dataset, adding items, and running experiments with Langfuse datasets. category: Datasets
Langfuse Datasets Cookbook
In this cookbook, we'll iterate on systems prompts with the goal of getting only the capital of a given country. We use Langfuse datasets, to store a list of example inputs and expected outputs.
This is a very simple example, you can run experiments on any LLM application that you either trace with the Langfuse SDKs (Python, JS/TS) or via one of our integrations (e.g. Langchain).
Simple example application
- Model: gpt-4o
- Input: country name
- Output: capital
- Evaluation: exact match of completion and ground truth
- Experiment on: system prompt
Setup
With the environment variables set, we can now initialize the Langfuse client. get_client() initializes the Langfuse client using the credentials provided in the environment variables.
Langfuse client is authenticated and ready!
Create a dataset
Items
Load local items into the Langfuse dataset. Alternatively you can add items from production via the Langfuse UI.
Example using Langfuse @observe() decorator
Application
This an example production application that we want to evaluate. It is instrumented with the Langfuse Decorator. We do not need to change the application code to evaluate it subsequently.
Experiment runner
This is a simple experiment runner that runs the application on each item in the dataset and evaluates the output.
Run experiments
Now we can easily run experiments with different configurations to explore which yields the best results.
Finished processing dataset 'capital_cities' for run 'famous_city'. Finished processing dataset 'capital_cities' for run 'directly_ask'. Finished processing dataset 'capital_cities' for run 'asking_specifically'. Finished processing dataset 'capital_cities' for run 'asking_specifically_2nd_try'.