Notebooks
A
Arize AI
Evaluate Toxicity Classifications

Evaluate Toxicity Classifications

arize-tutorialsphoenix_evals_examplescookbooksPython

phoenix logo
Docs | GitHub | Community

Toxicity Classification Evals

Arize provides tooling to evaluate LLM applications, including tools to determine if the generation of a model (or user response) is toxic. This detection can look for racist, bias'ed, derogatory, and bad language/angry responses.

The purpose of this notebook is:

  • to evaluate the performance of an LLM-assisted toxic detection
  • to provide an experimental framework for users to iterate and improve on the default classification template.
Note: This notebook was last updated on May 30, 2025.

Install Dependencies and Import Libraries

[ ]
[ ]

ℹ️ To enable async request submission in notebook environments like Jupyter or Google Colab, optionally use nest_asyncio. nest_asyncio globally patches asyncio to enable event loops to be re-entrant. This is not required for non-notebook environments.

Without nest_asyncio, eval submission can be much slower, depending on your organization's rate limits. Speed increases of about 5x are typical.

[ ]
[ ]

Download Benchmark Dataset

We'll evaluate the evaluation system consisting of an LLM model and settings in addition to an evaluation prompt template against a benchmark datasets of toxic and non-toxic text with ground-truth labels. Currently supported datasets include:

  • "wiki_toxic"
[ ]

Display Toxicity Classification Template

View the default template used to classify toxicity. You can tweak this template and evaluate its performance relative to the default.

[ ]

The template variables are:

  • input: the text to be classified

Configure the LLM

Configure your OpenAI API key.

[ ]

Benchmark Dataset Sample

Sample size determines run time Recommend iterating small: 100 samples Then increasing to large test set

[ ]
[ ]

Instantiate the LLM and set parameters.

[ ]
[ ]

LLM Evals: Toxicity Evals Classifications GPT-4

Instantiate the LLM and set parameters. Run toxicity classifications against a subset of the data.

[ ]

Evaluate the predictions against human-labeled ground-truth toxicity labels.

[ ]

LLM Evals: Toxicity Evals Classifications GPT-3.5

Instantiate the LLM and set parameters. Run toxicity classifications against a subset of the data.

[ ]
[ ]
[ ]

LLM Evals: Toxicity Evals Classifications GPT-4 Turbo

Instantiate the LLM and set parameters. Run toxicity classifications against a subset of the data.

[ ]
[ ]
[ ]