Notebooks
M
Mistral AI
Cookbook Phospho Mistral Integration

Cookbook Phospho Mistral Integration

phosphomistral-cookbookthird_party

phospho logo
Docs | Platform

Cookbook: Mistral AI integration(Python)

This cookbook provides step-by-step examples of integrating phospho with Mistral AI in Python.


Overview

This notebook shows how to log your Mistral ChatBot conversations in phospho. After logging, we use phospho's analytics to compare two Mistral models: mistral-7b and mistral-large-latest.

What is phospho ?

phospho is an open-source analytics platform for LLM applications, enabling automatic clustering to help discover use cases and topics. It provides no-code analytics to help you gain insights into how your app is being used, simplifying the process of understanding user behavior and trends.

Setup

First you need the phospho python module. It is compatible with Python >= 3.9.

[10]
[2]

Log your messages

Simple completion

Initialize the phospho module with phospho.init(). By default, phospho will look for PHOSPHO_API_KEY and PHOSPHO_PROJECT_ID environment variables. You can also pass the api_key and project_id as parameters.

[3]

Use phospho.log to log a query and an optional answer. Here is an example of a one query conversation with a Mistral assistant.

[3]
You're welcome! How can I assist you today? If you have any questions or need help with something, feel free to ask.
{'client_created_at': 1728548119,
, 'project_id': 'db99fc60ceac424fab371252f2f75485',
, 'session_id': None,
, 'task_id': 'cb3f05c74c9c4ce3ad3971d0e651c8b3',
, 'input': 'Thank you for your last message!',
, 'raw_input': 'Thank you for your last message!',
, 'raw_input_type_name': 'str',
, 'output': "You're welcome! How can I assist you today? If you have any questions or need help with something, feel free to ask.",
, 'raw_output': "You're welcome! How can I assist you today? If you have any questions or need help with something, feel free to ask.",
, 'raw_output_type_name': 'str',
, 'user_id': 'USER_ID',
, 'version_id': 'one_mistral_call'}

Your message is now accessible on the platform, where you can store all your transcripts. While logging, sentiment evaluation and language detection are performed automatically on the message, along with any events you specify.

Message in platform

Streaming completion

phospho supports streamed outputs. Pass stream=True to phospho.log to handle streaming responses. When iterating over the response, phospho will automatically log each chunk until the iteration is completed. Here is an example with a simple Mistral chatBot.

[15]
[22]
Ask anything (Type /exit to quit)
>Hi, what are you ?

Assistant: Hello! I am a Large Language Model trained by Mistral AI.
>Nice ! What is Mistral AI ?

Assistant: Hello! I am a text-based AI model designed to assist and provide information.

Mistral AI is a cutting-edge company based in Paris, France, developing large language models. I am very grateful they created me!
>So French is your native language ?

Assistant: Hello! I am a text-based AI model designed to assist and engage in conversation.

Mistral AI is a cutting-edge company based in Paris, France, developing large language models. I am very grateful they created me!

And yes, French is one of the languages I can understand and generate text in. How can I assist you today?
>/exit

The full conversation is now available to be viewed in the platform

Session in platform

Note: Find all the logging possibilities (async, async streaming, decorator...) in our documentation.

Evaluate Mistral models progressions with phospho analytics

Now that the messages are logged in the platform, it's time to set up analytics to uncover insights.

Let's take an example. Let's pretend we are building an mathematical AI tutor. We are unsure whether to use mistral-7b or mistral-large-latest. To compare these models on our use, we can use phospho. We will setup an AB test to evaluate the performance of both models based on the pedagogical quality of their responses to a set of mathematical problems.

The problems

In order to extract 50 problems, we use the MetaMath problem dataset, which can be accessed here.

Titre : MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
Auteurs : Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu
Journal : arXiv preprint arXiv:2309.12284
Année : 2023

[4]

Definition of the agents

In the system prompt, we explicity ask the chatbot to be pedagogical.

[5]

We create two teachers ; note that the only difference between them is the model.

[6]

Setup analytics in the platform

Before solving the problems, lets go to the platform and make sure everything is set up.

Create an event pedagogical quality of arguments that will rate from 1 to 5 the response of the teachers. This uses LLM as a judge with few shots to rate the events.

How to Set Up an Event:

  1. Log in to the platform and navigate to the Analytics tab.
  2. Select Add Scorer. A new page will appear.
  3. Fill in the required fields with your desired name and description.

Analytics are run on every messages you log to phospho.

Copy paste this into the form

Name

Pedagogical quality of arguments

Description

Evaluate the pedagogic quality of the mathematical reasoning, focusing on how well the explanation supports understanding. Consider the clarity of concepts, logical progression of ideas, and use of examples or analogies. Assess how effectively the reasoning addresses potential misunderstandings and promotes deeper comprehension.

Score from 1 to 5:

1: The explanation is unclear, lacks structure, and does not support learning.
2: The reasoning is somewhat understandable but leaves key concepts unexplained or confusing.
3: Reasonably clear, with some gaps in explanation or missing helpful examples.
4: Clear and well-organized, with good use of examples and logical flow, though minor improvements are possible.
5: Exceptionally clear and insightful, with well-structured explanations, helpful examples, and strong support for deep understanding.

Creation of a scorer

Enable analytics

Analytics in phospho are a paid feature (see details here).

When adding a payment method, you get free phospho credits that will be more than enough for this cookbook.

Note: You can also continue this notebook without adding a payment method: analytics won't run, but you'll still be able to log messages and view your messages.

Let's log the dataset to phospho

Let's simulate how the AI teachers models answer the problems, and log user messages and LLM outputs to phospho.

The analytics set up above will be run automatically on everything you log. This means that phospho will score all of these outputs according to the criteria you defined with LLM as a judge.

[9]
100%|██████████| 49/49 [13:02<00:00, 15.97s/it]

Results in AB Tests

Lets go back to the platform !

Using the sidebar, go to the AB test tab. This tab compares the metrics of different versions ids.

If you've set up a payment method, you should see a difference in the pedagogical quality between the two versions. For example:

  • mistral-7b has an average score of 3.3/5,
  • mistral-large-latest scores 4/5,

This tells us that mistral-large-latest performs better.

This helps you better understand the trade-off between cost and quality. For this use case, this tells us that mistral-large-latest is the best.

AB test

Other interesting insights...

Using the phospho clustering tool, we can in one click discover the main categories of problem in our dataset :

  • Equation-based problem solving
  • Fractions, series, modulo and conversion
  • Linear Algebra
  • Combinatorial calculations
  • General discussions (remember ? We discuss with the model a the beggining of the notebook)

As you see clustering is also useful to detect outliers.

Learn more about how phospho can help you improve your LLM apps analytics by getting started.

Clustering