Notebooks
W
Weights and Biases
Annotation Streamlit WB

Annotation Streamlit WB

streamlitwandb-examplesexamplesannotation

Annotations for LLMs with Streamlit and W&B

With Weights & Biases, log inputs and outputs from LLM experiments, then evaluate results. Examine individual prompts and responses at the application scale.

W&B Tables stores these critical assets in a single system of record alongside other artifacts, such as input datasets and model checkpoints, with essential metadata and lineage tracked for transparency and reproducibility.

One smart strategy is revising these assets in a table to improve on model performance. Streamlit's data editor, showcased in this application, provides an elegant and flexible solution using W&B Tables. Through the application's UI, annotators can flag outlier model responses, select next steps for refinement, and edit results in-place as needed. All of that can be easily exported and stored as a subsequent artifact to a Weights & Biases LLM development or tuning project.

This notebook walks through one simple approach, with the following steps:

  1. Run automated summary of news articles with Hugging Face pipelines
  2. Log Tables to W&B to compare two model approaches
  3. Download CSV files of Tables to annotate in Streamlit
  4. Annotate tables with Streamlit data editor
  5. Load annotated Tables to W&B for versioning and evaluation

馃弫 Let's get started!

First, install dependencies for W&B and Hugging Face.

[ ]
[ ]
<IPython.core.display.Javascript object>
wandb: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
 路路路路路路路路路路
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:
 路路路路路路路路路路
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
True

##1. Run automated summary of news articles with Hugging Face pipelines

This notebook will use a summarization example to showcase W&B and Streamlit, together. Summarization can serve a lot of important uses in ML pipelines, from assisting in data quality checks to preprocessing long-form data into something digestible for a downstream task, e.g., classification.There are many options out there for generating summaries automatically, but for ease of use we are going with Hugging Face pipelines.

[ ]

We will use the tried-and-true CNN/Daily Mail dataset to test out summarization outputs from 2 different pre-trained models from the Hugging Face model repository.

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

Here, we define a simple word function and lexical diversity function, which can be useful data points for examining text inputs and gauging how completely and fluently summarization outputs capture their "meaning."

There are many methods and dimensions to consider when evaluating summaries, to a quick vibes check to reference-based metrics (if you are lucky enough to have gold-standard reference summaries 馃崁). This walkthrough shows a simple manual approach, where automated summaries are evaluated for further refinement.

[ ]
[ ]
[ ]

2. Log Tables to W&B to compare two model approaches

W&B Tables help you visualize and query tabular data, whether it be numeric, categorical, text, images, or multimodal datasets. Tables help users compare how different models perform on the same test set, identify patterns in data (especially helpful with text analysis), and query inputs and outputs effectively to find outliers or useful patterns.

Here we log our automatically-generated summaries to W&B as an initial step in the overall LLM development and evaluation process. If you do not have a W&B account yet, follow this simple quickstart to get set up 馃専

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

3. Download CSV files of Tables to annotate in Streamlit

W&B Tables can be exported easily, programatically or from the UI. To instrument with python, we will convert a table to a W&B artifact (learn more here and then to a dataframe. From there, it's a simple csv export.

These csv files can be loaded to a simple Streamlit app for labeling.

[ ]
[ ]

4. Annotate tables with Streamlit data editor

This W&B repo contains a simple app that takes a user-loaded .csv file, creates a dataframe, displays that dataframe in a Streamlit app, and enables manual editing and exporting of a revised .csv file.

Once you have built your app and have it stored with any dependencies needed, you can run the app wherever Streamlit is installed with run streamlit app.py and you will get a URL for the app (http://localhost:8501/)

5. Load annotated Tables to W&B for versioning and evaluation

Once you have revised any or all entries in your Streamlit tables and exported the new .csv files, you can load the annotated version to the same W&B project to capture that step, and all its metadata, to keep in a central system of record for your LLM development project.

[ ]
[ ]
[ ]
[ ]
[ ]