Mistral AI Intent Classification

Intent Classification

mistral-cookbookclassifier_factorymistral

alph-notebooks/mistral-cookbook / intent_classification.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Intent Detection: Identify user intent efficiently with a custom classifier

In this cookbook, we will explore classification for intent detection and classification using our Classifier Factory.

To keep things straightforward, we will concentrate on a particular example that involves single-target classification.

Dataset

We will use a subset of the mteb/amazon_massive_intent dataset. This subset includes an intent for different user requests.

Subset

Let's download and prepare the subset. We will install the datasets library and load the dataset.

[ ]

Format Data

Now that we have loaded our dataset, we will convert it to the proper desired format to upload for training.

The data will be converted to a JSONL format as follows:

{"text": "place a birthday party with ale ross and amy in my calendar", "labels": {"intent": "calendar_set"}}
{"text": "new music tracks", "labels": {"intent": "play_music"}}
{"text": "get me the details of upcoming oscar two thousand and seventeen", "labels": {"intent": "calendar_query"}}
{"text": "is there any event today in my calendar", "labels": {"intent": "calendar_query"}}
{"text": "send email to mommy that i'll be going the party", "labels": {"intent": "email_sendemail"}}
...

With an example of a label being:

"labels": {
  "intent": "email_sendemail"
}

For single-target classification.

[ ]

100%|██████████| 5313/5313 [00:00<00:00, 66090.92it/s]
100%|██████████| 664/664 [00:00<00:00, 71039.12it/s]
100%|██████████| 665/665 [00:00<00:00, 74006.00it/s]

The data was converted and saved properly. We can now train our model.

Training

There are two methods to train the model: either upload and train via la platforme or via the API.

First, we need to install mistralai.

[ ]

And setup our client, you can create an API key here.

[ ]

We will upload 2 files, the training set and the validation set ( optional ) that will be used for validation loss.

[ ]

With the data uploaded, we can create a job.

We allow users to keep track of aconsiderable amount of metrics via our Weights and Biases integration that we strongly recommend, you can make use of it by providing the project name and your key.

[ ]

{
    "id": "db22ea7e-1895-4309-92d9-9ac881b1b117",
    "auto_start": false,
    "model": "ministral-3b-latest",
    "status": "QUEUED",
    "created_at": 1744814706,
    "modified_at": 1744814706,
    "training_files": [
        "c36d490e-d2f8-4f98-9679-1c55625e09d5"
    ],
    "hyperparameters": {
        "training_steps": 100,
        "learning_rate": 4e-05,
        "weight_decay": 0.1,
        "warmup_fraction": 0.05,
        "epochs": null,
        "seq_len": 16384
    },
    "validation_files": [
        "4b62db5b-47b8-42fb-a782-89457766cff7"
    ],
    "fine_tuned_model": null,
    "suffix": null,
    "integrations": [
        {
            "project": "intent-classifier",
            "name": null,
            "run_name": null,
            "url": null
        }
    ],
    "trained_tokens": null,
    "metadata": {
        "expected_duration_seconds": null,
        "cost": 0.0,
        "cost_currency": null,
        "train_tokens_per_step": null,
        "train_tokens": null,
        "data_tokens": null,
        "estimated_start_time": null
    }
}

Once the job is created, we can review details such as the number of epochs and other relevant information. This allows us to make informed decisions before initiating the job.

We'll retrieve the job and wait for it to complete the validation process before starting. This validation step ensures the job is ready to begin.

[ ]

{
    "id": "db22ea7e-1895-4309-92d9-9ac881b1b117",
    "auto_start": false,
    "model": "ministral-3b-latest",
    "status": "VALIDATED",
    "created_at": 1744814706,
    "modified_at": 1744814709,
    "training_files": [
        "c36d490e-d2f8-4f98-9679-1c55625e09d5"
    ],
    "hyperparameters": {
        "training_steps": 100,
        "learning_rate": 4e-05,
        "weight_decay": 0.1,
        "warmup_fraction": 0.05,
        "epochs": 47.74101432172152,
        "seq_len": 16384
    },
    "classifier_targets": [
        {
            "name": "intent",
            "labels": [
                "social_post",
                "email_sendemail",
                "datetime_query",
                "play_music",
                "email_query",
                "news_query",
                "weather_query",
                "calendar_query",
                "general_quirky",
                "qa_factoid",
                "play_radio",
                "calendar_set",
                "qa_definition",
                "calendar_remove",
                "transport_query",
                "cooking_recipe"
            ]
        }
    ],
    "validation_files": [
        "4b62db5b-47b8-42fb-a782-89457766cff7"
    ],
    "fine_tuned_model": null,
    "suffix": null,
    "integrations": [
        {
            "project": "intent-classifier",
            "name": null,
            "run_name": null,
            "url": null
        }
    ],
    "trained_tokens": null,
    "metadata": {
        "expected_duration_seconds": 400,
        "cost": 3.28,
        "cost_currency": "EUR",
        "train_tokens_per_step": 65536,
        "train_tokens": 6553600,
        "data_tokens": 137274,
        "estimated_start_time": null
    },
    "events": [
        {
            "name": "status-updated",
            "created_at": 1744814706,
            "data": {
                "status": "QUEUED"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814709,
            "data": {
                "status": "VALIDATING"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814709,
            "data": {
                "status": "VALIDATED"
            }
        }
    ],
    "checkpoints": []
}

We can now run the job.

[ ]

{
    "id": "db22ea7e-1895-4309-92d9-9ac881b1b117",
    "auto_start": false,
    "model": "ministral-3b-latest",
    "status": "QUEUED",
    "created_at": 1744814706,
    "modified_at": 1744814712,
    "training_files": [
        "c36d490e-d2f8-4f98-9679-1c55625e09d5"
    ],
    "hyperparameters": {
        "training_steps": 100,
        "learning_rate": 4e-05,
        "weight_decay": 0.1,
        "warmup_fraction": 0.05,
        "epochs": 47.74101432172152,
        "seq_len": 16384
    },
    "classifier_targets": [
        {
            "name": "intent",
            "labels": [
                "social_post",
                "email_sendemail",
                "datetime_query",
                "play_music",
                "email_query",
                "news_query",
                "weather_query",
                "calendar_query",
                "general_quirky",
                "qa_factoid",
                "play_radio",
                "calendar_set",
                "qa_definition",
                "calendar_remove",
                "transport_query",
                "cooking_recipe"
            ]
        }
    ],
    "validation_files": [
        "4b62db5b-47b8-42fb-a782-89457766cff7"
    ],
    "fine_tuned_model": null,
    "suffix": null,
    "integrations": [
        {
            "project": "intent-classifier",
            "name": null,
            "run_name": null,
            "url": null
        }
    ],
    "trained_tokens": null,
    "metadata": {
        "expected_duration_seconds": 400,
        "cost": 3.28,
        "cost_currency": "EUR",
        "train_tokens_per_step": 65536,
        "train_tokens": 6553600,
        "data_tokens": 137274,
        "estimated_start_time": null
    },
    "events": [
        {
            "name": "status-updated",
            "created_at": 1744814706,
            "data": {
                "status": "QUEUED"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814709,
            "data": {
                "status": "VALIDATING"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814709,
            "data": {
                "status": "VALIDATED"
            }
        }
    ],
    "checkpoints": []
}

The job is now starting. Let's keep track of the status and print the information.

We highly recommend making use of our Weights and Biases integration to keep track of multiple metrics.

WANDB

Training:

Eval/Validation:

[ ]

{
    "id": "db22ea7e-1895-4309-92d9-9ac881b1b117",
    "auto_start": false,
    "model": "ministral-3b-latest",
    "status": "SUCCESS",
    "created_at": 1744814706,
    "modified_at": 1744814940,
    "training_files": [
        "c36d490e-d2f8-4f98-9679-1c55625e09d5"
    ],
    "hyperparameters": {
        "training_steps": 100,
        "learning_rate": 4e-05,
        "weight_decay": 0.1,
        "warmup_fraction": 0.05,
        "epochs": 47.74101432172152,
        "seq_len": 16384
    },
    "classifier_targets": [
        {
            "name": "intent",
            "labels": [
                "social_post",
                "email_sendemail",
                "datetime_query",
                "play_music",
                "email_query",
                "news_query",
                "weather_query",
                "calendar_query",
                "general_quirky",
                "qa_factoid",
                "play_radio",
                "calendar_set",
                "qa_definition",
                "calendar_remove",
                "transport_query",
                "cooking_recipe"
            ]
        }
    ],
    "validation_files": [
        "4b62db5b-47b8-42fb-a782-89457766cff7"
    ],
    "fine_tuned_model": "ft:classifier:ministral-3b-latest:8e2706f0:20250416:db22ea7e",
    "suffix": null,
    "integrations": [
        {
            "project": "intent-classifier",
            "name": null,
            "run_name": null,
            "url": "https://wandb.ai/mistral-ai/intent-classifier/runs/tpyu9twr"
        }
    ],
    "trained_tokens": 1638400,
    "metadata": {
        "expected_duration_seconds": 400,
        "cost": 3.28,
        "cost_currency": "EUR",
        "train_tokens_per_step": 65536,
        "train_tokens": 6553600,
        "data_tokens": 137274,
        "estimated_start_time": null
    },
    "events": [
        {
            "name": "status-updated",
            "created_at": 1744814706,
            "data": {
                "status": "QUEUED"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814709,
            "data": {
                "status": "VALIDATING"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814709,
            "data": {
                "status": "VALIDATED"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814716,
            "data": {
                "status": "RUNNING"
            }
        },
        {
            "name": "status-updated",
            "created_at": 1744814940,
            "data": {
                "status": "SUCCESS"
            }
        }
    ],
    "checkpoints": [
        {
            "metrics": {
                "train_loss": 0.000191,
                "valid_loss": 0.000625,
                "valid_mean_token_accuracy": 1.000433
            },
            "step_number": 100,
            "created_at": 1744814919
        },
        {
            "metrics": {
                "train_loss": 0.000181,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 99,
            "created_at": 1744814909
        },
        {
            "metrics": {
                "train_loss": 0.000221,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 98,
            "created_at": 1744814909
        },
        {
            "metrics": {
                "train_loss": 0.000189,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 97,
            "created_at": 1744814909
        },
        {
            "metrics": {
                "train_loss": 0.000186,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 96,
            "created_at": 1744814909
        },
        {
            "metrics": {
                "train_loss": 0.000192,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 95,
            "created_at": 1744814909
        },
        {
            "metrics": {
                "train_loss": 0.000174,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 94,
            "created_at": 1744814899
        },
        {
            "metrics": {
                "train_loss": 0.000196,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 93,
            "created_at": 1744814899
        },
        {
            "metrics": {
                "train_loss": 0.000199,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 92,
            "creat
[...]
n_loss": 0.001359,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 20,
            "created_at": 1744814769
        },
        {
            "metrics": {
                "train_loss": 0.001401,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 19,
            "created_at": 1744814769
        },
        {
            "metrics": {
                "train_loss": 0.001442,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 18,
            "created_at": 1744814769
        },
        {
            "metrics": {
                "train_loss": 0.001453,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 17,
            "created_at": 1744814769
        },
        {
            "metrics": {
                "train_loss": 0.001489,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 16,
            "created_at": 1744814769
        },
        {
            "metrics": {
                "train_loss": 0.001482,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 15,
            "created_at": 1744814759
        },
        {
            "metrics": {
                "train_loss": 0.001528,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 14,
            "created_at": 1744814759
        },
        {
            "metrics": {
                "train_loss": 0.001536,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 13,
            "created_at": 1744814759
        },
        {
            "metrics": {
                "train_loss": 0.001585,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 12,
            "created_at": 1744814759
        },
        {
            "metrics": {
                "train_loss": 0.001623,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 11,
            "created_at": 1744814759
        },
        {
            "metrics": {
                "train_loss": 0.001646,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 10,
            "created_at": 1744814749
        },
        {
            "metrics": {
                "train_loss": 0.001688,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 9,
            "created_at": 1744814749
        },
        {
            "metrics": {
                "train_loss": 0.001721,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 8,
            "created_at": 1744814749
        },
        {
            "metrics": {
                "train_loss": 0.001749,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 7,
            "created_at": 1744814749
        },
        {
            "metrics": {
                "train_loss": 0.001787,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 6,
            "created_at": 1744814749
        },
        {
            "metrics": {
                "train_loss": 0.001819,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 5,
            "created_at": 1744814749
        },
        {
            "metrics": {
                "train_loss": 0.001887,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 4,
            "created_at": 1744814739
        },
        {
            "metrics": {
                "train_loss": 0.001917,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 3,
            "created_at": 1744814739
        },
        {
            "metrics": {
                "train_loss": 0.001893,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 2,
            "created_at": 1744814739
        },
        {
            "metrics": {
                "train_loss": 0.00189,
                "valid_loss": 0.0,
                "valid_mean_token_accuracy": 0.0
            },
            "step_number": 1,
            "created_at": 1744814739
        }
    ]
}

Inference

Our model is trained and ready for use! Let's test it on a sample from our test set!

[ ]

Text: what's the weather forecast for today
Classifier Response: {
    "id": "4c521c6507674c7da7fb8c2fa78bf8ae",
    "model": "ft:classifier:ministral-3b-latest:8e2706f0:20250416:db22ea7e",
    "results": [
        {
            "intent": {
                "scores": {
                    "social_post": 3.0807409530098084e-06,
                    "email_sendemail": 1.2064355132679339e-06,
                    "datetime_query": 0.0008676558500155807,
                    "play_music": 4.579112555802567e-07,
                    "email_query": 5.323042842064751e-06,
                    "news_query": 0.00020608631893992424,
                    "weather_query": 0.9971635937690735,
                    "calendar_query": 0.00047916508628986776,
                    "general_quirky": 0.0006549410172738135,
                    "qa_factoid": 0.0001449998380849138,
                    "play_radio": 1.3382128599914722e-05,
                    "calendar_set": 9.489350304647814e-06,
                    "qa_definition": 9.003245213534683e-05,
                    "calendar_remove": 2.98595659842249e-06,
                    "transport_query": 0.0002646200591698289,
                    "cooking_recipe": 9.289039007853717e-05
                }
            }
        }
    ]
}

The score with the highest result is weather_query, with an over 99% score!

There you have it: a simple guide on how to train your own classifier and use our batch inference.

For a more specific multi-label classifier, visit this cookbook.

For a more product focused in-depth guide on both multi-target, with an evaluation comparison between LLMs and our classifier, visit this cookbook.