Notebooks
A
Arize AI
Nvidia Arize Data Flywheel

Nvidia Arize Data Flywheel

agentsarize-tutorialsLLMPython

arize logo
Docs | GitHub | Slack Community

Arize + NVIDIA Data Flywheel Integration Tutorial

This notebook demonstrates how Arize platform and NVIDIA Nemo microservices drive improvement in agents

Notebook Steps:

  • Arize curates golden datasets from production traces + online evaluations + human labels
  • NVIDIA nemo customizer fine-tunes model based on Arize curated golden datasets
  • Run experiments in Arize to generate comparison metrics on the Baseline model vs Fine-tuned model
  • Compare and deep dive into experiment results in Arize experiments

Use Case:

  • Improve refusal quality and compliance posture of LLM:
  • LLM should respond in consistent manner, politely declining service: "I'm sorry but I can't assist with that."
  • LLM should provide a reason for refusal in response: "It is against my safety policies. Reason (Violence, self harm, illegal activity, etc.)

Prerequisites:

  • You will need access to NVIDIA NeMo microservices in order to run this notebook.
  • This notebook was tested on NeMo microservices on NVIDIA Brev, which provides streamlined access to NVIDIA GPU instances on popular cloud platforms, automatic environment setup, and flexible deployment options, enabling developers to start experimenting instantly.
  • Arize AX account (Sign up for free)

Part 1: Export Data from Arize and Create Training Datasets

1.1 Install Dependencies and Export from Arize

[ ]
[ ]

1.2 Data Prep: Convert Arize Data to JSONL Training Format

[ ]

Part 2: Set Up NVIDIA NeMo Customizer

Nemo microservices (customizer) need to be deployed before proceeding. Here is a link to a demo cluster set up document. Note the version used in this tutorial is 25.09

2.1 Initialize NeMo Client

[ ]

2.2 Check Customization Configuration

[ ]
[ ]

Part 3: Upload Dataset to NeMo Data Store

[ ]
[ ]

Part 4: Deploy Base Model for Baseline Testing

[ ]
[ ]

Part 5: Test Baseline Model

[ ]

Part 6: Create and Run Fine-Tuning Job

[ ]
[ ]

Part 7: Test Fine-Tuned Model

[ ]
[ ]
[ ]

Part 8: Run Baseline vs Fine-Tuned Experiments and Upload to Arize

In this section, we'll:

  1. Create an Arize dataset from our test dataset
  2. Run two experiments in Arize - one for baseline and one for fine-tuned model
  3. In Arize, run LLM as a Judge Evaluator to validate performance of refusals and compare results in Arize UI
[ ]
[ ]
[ ]
[ ]
[ ]

Part 9: Now view your experiments in Arize UI!