Mistral AI Prompt Optimization

Prompt Optimization

mistral-cookbookthird_partymetagpt

alph-notebooks/mistral-cookbook / prompt_optimization.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Automated Prompt Optimization

❌ Prompt engineering... sucks. It's a non-standard process, heavily relying on trial and error and difficult to standardize
🤩 Luckily, we can automate it using ✨prompt optimization✨, investigated in recent works such as Self-Supervised Prompt Optimization
🎯 In its essence, Prompt Optimization (PO) consists in the process of taking a prompt aiming at performing a certain task and iteratively refining it to make it better for the specific problem tackled.
✅ This notebook gives an overview of how to use PO with Mistral models

Problem setting

You have put up a form, and collected many more answers than the ones you can read.
Your survey got popular---very popular, 😅---and need to sift through the answers. To keep things accessibly, we allowed (and will continue to!) responses using plain text.
Filtering is therefore impossible. Still, you need some strategies to sift through the applications received to identify the most promising profiles.
Let's define a few prompts to process answers and output answers we can filter on effectively.

Task prompts

Let's define a few prompts to process answers
These prompts are purposely not optimized, and rather serve as an example of something quick and dirty we wish to work with.
For this example, we will consider answers collected as part of the applications for our Ambassadorship Program

[1]

Installing dependancies

To use SPO via MetaGPT you need to clone the repository, and move this notebook inside of it. Dependancies are not easily usable, but hacking around it is fairly straightforward 😉

Just run:

[2]

Cloning into 'MetaGPT'...
remote: Enumerating objects: 48797, done.
remote: Counting objects: 100% (287/287), done.
remote: Compressing objects: 100% (136/136), done.
remote: Total 48797 (delta 195), reused 151 (delta 151), pack-reused 48510 (from 3)
Receiving objects: 100% (48797/48797), 179.81 MiB | 45.07 MiB/s, done.
Resolving deltas: 100% (36800/36800), done.
/Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT

Create instruction files

After having installed metagpt, we can perform prompt optimization creating a yaml file specifying the task tackled.

From metagpt documentation, this yaml file needs the following structure:

prompt: |
  Please solve the following problem.

requirements: |
  ...

count: None

qa:
  - question: |
      ...
    answer: |
      ...

  - question: |
      ...
    answer: |
      ...

We will need to generate one of these template files for each of the prompts we are seeking to optimize. Luckily, we can do so automatically.

Also, as the tasks we're dealing with are fairly straightforward we can spare us providing few shot examples in the form Q&As 🤩

Still, these template files offer a very straightforward way to provide real-world few-shot examples so definitely worth looking into those.

[3]

[4]

Creating model files

Once you created template files for the different prompts, you need to specify which models you need to use as (1) executors (2) evaluators and (3) optimizers for the different prompts.

metagpt's SPO requires you to provide these models within a specific .yaml file---you can use the following snippet to create these files using your own Mistral API key (get one!).

[12]

[13]

We're good! 🎉

Once you have (1) template files for your candidate prompts and (2) a models.yaml file to identify the different models you wish to use, we can get start running rounds and optimizing the prompts 😊

A little hack: jupyter notebooks don't really work with `asyncio` 🫠

...if only jupyter notebooks worked well with asyncio 😂 The little hack here is to export the code you need to run prompt optimization to a .py file and then run that one using CLI-like instructions.

Here we are only creating one file for the job title extraction prompt. Exporting these prompt optimization processes to different files also allows for parallel execution (💨, right?). For the sake of demonstration, we are only showing how to optimize one prompt (job extraction), but you can easily switch this to other prompts yourself.

[14]

Overwriting spo.py

Now, let's run prompt optimization ☀️

[15]

2025-04-19 15:33:24.300 | INFO     | metagpt.const:get_metagpt_package_root:15 - Package root set to /Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT
2025-04-19 15:33:24.300 | INFO     | metagpt.const:get_metagpt_package_root:15 - Package root set to /Users/francescocapuano/Desktop/prompt-optimization/third_party/MetaGPT/MetaGPT
2025-04-19 15:33:25.337 | INFO     | metagpt.ext.spo.components.optimizer:_handle_first_round:80 - 
⚡ RUNNING Round 1 PROMPT ⚡

2025-04-19 15:33:43.216 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.000 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 226, completion_tokens: 2
2025-04-19 15:33:43.370 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:97 - 
🚀Round 2 OPTIMIZATION STARTING 🚀

2025-04-19 15:33:43.370 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:98 - 
Selecting prompt for round 1 and advancing to the iteration phase

2025-04-19 15:33:49.760 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.012 | Max budget: $10.000 | Current cost: $0.012, prompt_tokens: 587, completion_tokens: 321
2025-04-19 15:33:49.761 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:116 - Modification of 2 round: Streamline the instructions and clarify the input format to reduce confusion and improve robustness against ambiguous job titles.
2025-04-19 15:33:49.761 | INFO     | metagpt.ext.spo.components.optimizer:_optimize_prompt:71 - 
Round 2 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. Provide your answer using one word only. Do not include any additional context or explanations.

    # INPUT declared title: the person's job title is

2025-04-19 15:33:49.762 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:122 - 
⚡ RUNNING OPTIMIZED PROMPT ⚡

2025-04-19 15:33:52.430 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 96, completion_tokens: 2
2025-04-19 15:33:52.430 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:125 - 
📊 EVALUATING OPTIMIZED PROMPT 📊

2025-04-19 15:34:27.452 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.002 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 175
2025-04-19 15:34:33.397 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.005 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 548, completion_tokens: 237
2025-04-19 15:34:52.530 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.007 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 156
2025-04-19 15:35:02.464 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.009 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 548, completion_tokens: 182
2025-04-19 15:35:02.465 | INFO     | metagpt.ext.spo.utils.evaluation_utils:evaluate_prompt:63 - Evaluation Results [True, True, True, True]
2025-04-19 15:35:02.467 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:135 - 
🎯 OPTIMIZATION RESULT 🎯

2025-04-19 15:35:02.467 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:136 - 
Round 2 Optimization: ✅ SUCCESS

2025-04-19 15:35:02.473 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:97 - 
🚀Round 3 OPTIMIZATION STARTING 🚀

2025-04-19 15:35:02.473 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:98 - 
Selecting prompt for round 2 and advancing to the iteration phase

2025-04-19 15:35:16.753 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.022 | Max budget: $10.000 | Current cost: $0.010, prompt_tokens: 408, completion_tokens: 260
2025-04-19 15:35:16.754 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:116 - Modification of 3 round: Include instructions for handling ambiguous job titles and ensure consistent formatting.
2025-04-19 15:35:16.754 | INFO     | metagpt.ext.spo.components.optimizer:_optimize_prompt:71 - 
Round 3 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.

    # INPUT declared title: the person's job title is

2025-04-19 15:35:16.754 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:122 - 
⚡ RUNNING OPTIMIZED PROMPT ⚡

2025-04-19 15:35:17.382 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 122, completion_tokens: 2
2025-04-19 15:35:17.383 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:125 - 
📊 EVALUATING OPTIMIZED PROMPT 📊

2025-04-19 15:35:34.698 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.011 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 185
2025-04-19 15:35:39.916 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.013 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 243
2025-04-19 15:35:57.867 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.015 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 396, completion_tokens: 227
2025-04-19 15:37:16.285 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.018 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 396, completion_tokens: 349
2025-04-19 15:37:16.286 | INFO     | metagpt.ext.spo.utils.evaluation_utils:evaluate_prompt:63 - Evaluation Results [True, True, True, True]
2025-04-19 15:37:16.287 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:135 - 
🎯 OPTIMIZATION RESULT 🎯

2025-04-19 15:37:16.287 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:136 - 
Round 3 Optimization: ✅ SUCCESS

2025-04-19 15:37:16.291 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:97 - 
🚀Round 4 OPTIMIZATION STARTING 🚀

2025-04-19 15:37:16.291 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:98 - 
Selecting prompt for round 3 and advancing to the iteration phase

2025-04-19 15:37:25.535 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.035 | Max budget: $10.000 | Current cost: $0.013, prompt_tokens: 436, completion_tokens: 398
2025-04-19 15:37:25.535 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:116 - Modification of 4 round: Include explicit guidelines for handling ambiguity, provide contextual examples, define a clear input format, and reinforce the output format with an example.
2025-04-19 15:37:25.535 | INFO     | metagpt.ext.spo.components.optimizer:_optimize_prompt:71 - 
Round 4 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.

    # INPUT: The person's job title is: [Job Title]

    # Example:
    # INPUT: The person's job title is: Software Developer
    # OUTPUT: ENGINEERING

2025-04-19 15:37:25.535 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:122 - 
⚡ RUNNING OPTIMIZED PROMPT ⚡

2025-04-19 15:37:56.956 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.001 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 185, completion_tokens: 21
2025-04-19 15:37:56.957 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:125 - 
📊 EVALUATING OPTIMIZED PROMPT 📊

2025-04-19 15:38:10.473 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.020 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 230
2025-04-19 15:38:10.476 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.023 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 236
2025-04-19 15:38:14.260 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.025 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 507, completion_tokens: 210
2025-04-19 15:38:34.786 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.028 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 507, completion_tokens: 250
2025-04-19 15:38:34.787 | INFO     | metagpt.ext.spo.utils.evaluation_utils:evaluate_prompt:63 - Evaluation Results [True, True, True, True]
2025-04-19 15:38:34.789 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:135 - 
🎯 OPTIMIZATION RESULT 🎯

2025-04-19 15:38:34.789 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:136 - 
Round 4 Optimization: ✅ SUCCESS

2025-04-19 15:38:34.795 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:97 - 
🚀Round 5 OPTIMIZATION STARTING 🚀

2025-04-19 15:38:34.795 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:98 - 
Selecting prompt for round 4 and advancing to the iteration phase

2025-04-19 15:38:42.527 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.048 | Max budget: $10.000 | Current cost: $0.013, prompt_tokens: 529, completion_tokens: 383
2025-04-19 15:38:42.527 | INFO     | metagpt.ext.spo.components.optimizer:_generate_optimized_prompt:116 - Modification of 5 round: Include explicit criteria and additional examples for less common job titles to improve classification accuracy.
2025-04-19 15:38:42.528 | INFO     | metagpt.ext.spo.components.optimizer:_optimize_prompt:71 - 
Round 5 Prompt: Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.

    # INPUT: The person's job title is: [Job Title]

    # Example:
    # INPUT: The person's job title is: Software Developer
    # OUTPUT: ENGINEERING

2025-04-19 15:38:42.530 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:122 - 
⚡ RUNNING OPTIMIZED PROMPT ⚡

2025-04-19 15:49:40.335 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.002 | Max budget: $10.000 | Current cost: $0.000, prompt_tokens: 202, completion_tokens: 12
2025-04-19 15:49:40.335 | INFO     | metagpt.ext.spo.components.optimizer:_evaluate_new_prompt:125 - 
📊 EVALUATING OPTIMIZED PROMPT 📊

2025-04-19 15:49:58.080 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.030 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 246
2025-04-19 15:50:31.634 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.033 | Max budget: $10.000 | Current cost: $0.002, prompt_tokens: 600, completion_tokens: 215
2025-04-19 15:50:41.015 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.035 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 232
2025-04-19 15:51:11.878 | INFO     | metagpt.utils.cost_manager:update_cost:57 - Total running cost: $0.038 | Max budget: $10.000 | Current cost: $0.003, prompt_tokens: 600, completion_tokens: 231
2025-04-19 15:51:11.879 | INFO     | metagpt.ext.spo.utils.evaluation_utils:evaluate_prompt:63 - Evaluation Results [False, True, True, True]
2025-04-19 15:51:11.881 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:135 - 
🎯 OPTIMIZATION RESULT 🎯

2025-04-19 15:51:11.882 | INFO     | metagpt.ext.spo.components.optimizer:_log_optimization_result:136 - 
Round 5 Optimization: ✅ SUCCESS

2025-04-19 15:51:11.883 | INFO     | metagpt.ext.spo.components.optimizer:show_final_result:52 - 
==================================================
2025-04-19 15:51:11.884 | INFO     | metagpt.ext.spo.components.optimizer:show_final_result:53 - 
🏆 OPTIMIZATION COMPLETED - FINAL RESULTS 🏆

2025-04-19 15:51:11.884 | INFO     | metagpt.ext.spo.components.optimizer:show_final_result:54 - 
📌 Best Performing Round: 5
2025-04-19 15:51:11.884 | INFO     | metagpt.ext.spo.components.optimizer:show_final_result:55 - 
🎯 Final Optimized Prompt:
Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.

    # INPUT: The person's job title is: [Job Title]

    # Example:
    # INPUT: The person's job title is: Software Developer
    # OUTPUT: ENGINEERING
2025-04-19 15:51:11.884 | INFO     | metagpt.ext.spo.components.optimizer:show_final_result:56 - 
==================================================

Asessing the results

Original Prompt	Optimized Prompt
Your task is to provide me with a direct classification of the person's job title into one of 4 categories. The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. There is no possibility for mixed assignments. You always assign one and one only category to each subject. When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if (1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. # INPUT declared title: the person job title is {job_title}	Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations. # INPUT: The person's job title is: {job_title} # Example: # INPUT: The person's job title is: Software Developer # OUTPUT: ENGINEERING

Original Prompt

Optimized Prompt

Your task is to provide me with a direct classification of the person's job title into one of 4 categories. The categories you can decide are always: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. There is no possibility for mixed assignments. You always assign one and one only category to each subject. When in doubt, assign to 'OTHER'. You must strictly adhere to the categories I have mentioned, and nothing more. This means that you cannot use any other output apart from 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER', 'OTHER'. Keep your answer very, very concise. Don't give context on your answer. As a matter of fact, only answer with one word based on the category you deem the most appropriate. Absolutely don't change this. You will be penalized if (1) you use a category outside of the ones I have mentioned and (2) you use more than 1 word in your output. # INPUT declared title: the person job title is {job_title}

Your task is to classify the given job title into one of the following categories: 'RESEARCH', 'ENGINEERING', 'BUSINESS', 'FOUNDER'. If the job title does not fit any of these categories, classify it as 'OTHER'. You must strictly adhere to these categories. If a job title is ambiguous or could fit into multiple categories, choose the most relevant category based on common industry standards. For example, 'Data Scientist' could fit into both 'RESEARCH' and 'ENGINEERING', but is typically classified as 'RESEARCH'. Similarly, 'Data Analyst' is typically classified as 'BUSINESS'. Provide your answer using one word only, in all uppercase letters without any additional context or explanations.

# INPUT: The person's job title is: {job_title}

# Example:
# INPUT: The person's job title is: Software Developer
# OUTPUT: ENGINEERING

Results indicate the original prompt is modified according to typical best-practices, such as providing examples to guide the LLM (few-shot prompting), or by providing tag-like elements to direct the model's attention towards particular parts of the input prompt.

This revised prompt has been obtained using only 5 optimization "rounds", and can further be optimized (although finally satisfactory performance is of course a heuristic in the context of black-box optimization)