Unsloth Magistral (24B) Reasoning Conversational

Magistral (24B) Reasoning Conversational

unsloth-notebooksunslothnb

alph-notebooks/unsloth-notebooks / Magistral_(24B)-Reasoning-Conversational.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!

Join Discord if you need help + ⭐ Star us on Github ⭐

To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.

You will learn how to do data prep, how to train, how to run the model, & how to save it

News

Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog

You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog

Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog

3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context

New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL

Visit our docs for all our model uploads and notebooks.

Installation

[ ]

Unsloth

[ ]

==((====))==  Unsloth 2025.6.1: Fast Mistral patching. Transformers: 4.52.4.
   \\   /|    NVIDIA L4. Num GPUs = 1. Max memory: 22.161 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.9. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

model.safetensors.index.json:   0%|          | 0.00/164k [00:00<?, ?B/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.96G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/216 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/201k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.1M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/21.4k [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/2.73k [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

[5]

Unsloth 2025.6.1 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.

Data Prep

We use the Open Math Reasoning dataset which was used to win the AIMO (AI Mathematical Olympiad - Progress Prize 2) challenge! We sample 10% of verifiable reasoning traces that used DeepSeek R1, and which got > 95% accuracy.

[6]

README.md:   0%|          | 0.00/603 [00:00<?, ?B/s]

data/cot-00000-of-00001.parquet:   0%|          | 0.00/106M [00:00<?, ?B/s]

Generating cot split:   0%|          | 0/19252 [00:00<?, ? examples/s]

Let's see the structure of our dataset:

[7]

Dataset({
,    features: ['expected_answer', 'problem_type', 'problem_source', 'generation_model', 'pass_rate_72b_tir', 'problem', 'generated_solution', 'inference_mode'],
,    num_rows: 19252
,})

We now convert the reasoning dataset into conversational format:

[8]

[9]

Map:   0%|          | 0/19252 [00:00<?, ? examples/s]

Let's see the first transformed row:

[10]

"<s>[SYSTEM_PROMPT]A user will ask you to solve a task. You should first draft your thinking process (inner monologue) until you have derived the final answer. Afterwards, write a self-contained summary of your thoughts (i.e. your summary should be succinct but contain all the critical steps you needed to reach the conclusion). You should use Markdown and Latex to format your response. Write both your thoughts and summary in the same language as the task posed by the user.\n\nYour thinking process must follow the template below:\n<think>\nYour thoughts or/and draft, like working through an exercise on scratch paper. Be as casual and as long as you want until you are confident to generate a correct answer.\n</think>\n\nHere, provide a concise summary that reflects your reasoning and presents a clear final answer to the user.\n\nProblem:[/SYSTEM_PROMPT][INST]Given $\\sqrt{x^2+165}-\\sqrt{x^2-52}=7$ and $x$ is positive, find all possible values of $x$.[/INST]<think>\nOkay, let's see. I need to solve the equation √(x² + 165) - √(x² - 52) = 7, and find all positive values of x. Hmm, radicals can be tricky, but maybe if I can eliminate the square roots by squaring both sides. Let me try that.\n\nFirst, let me write down the equation again to make sure I have it right:\n\n√(x² + 165) - √(x² - 52) = 7.\n\nOkay, so the idea is to isolate one of the radicals and then square both sides. Let me try moving the second radical to the other side:\n\n√(x² + 165) = 7 + √(x² - 52).\n\nNow, if I square both sides, maybe I can get rid of the square roots. Let's do that:\n\n(√(x² + 165))² = (7 + √(x² - 52))².\n\nSimplifying the left side:\n\nx² + 165 = 49 + 14√(x² - 52) + (√(x² - 52))².\n\nThe right side is expanded using the formula (a + b)² = a² + 2ab + b². So the right side becomes 7² + 2*7*√(x² - 52) + (√(x² - 52))², which is 49 + 14√(x² - 52) + (x² - 52).\n\nSo putting it all together:\n\nx² + 165 = 49 + 14√(x² - 52) + x² - 52.\n\nHmm, let's simplify the right side. The x² terms will cancel out, right? Let's subtract x² from both sides:\n\n165 = 49 + 14√(x² - 52) - 52.\n\nSimplify the constants on the right:\n\n49 - 52 is -3, so:\n\n165 = -3 + 14√(x² - 52).\n\nNow, add 3 to both sides to isolate the radical term:\n\n165 + 3 = 14√(x² - 52).\n\nSo 168 = 14√(x² - 52).\n\nDivide both sides by 14:\n\n168 / 14 = √(x² - 52).\n\n12 = √(x² - 52).\n\nNow, square both sides again to eliminate the square root:\n\n12² = x² - 52.\n\n144 = x² - 52.\n\nAdd 52 to both sides:\n\n144 + 52 = x².\n\n196 = x².\n\nSo x = √196 = 14.\n\nBut wait, since the problem states that x is positive, we only take the positive root. So x = 14.\n\nBut hold on, when dealing with squaring equations, sometimes extraneous solutions can come up. I should check if this solution actually satisfies the original equation.\n\nLet's plug x = 14 back into the original equation:\n\n√(14² + 165) - √(14² - 52) = ?\n\nCalculate each term:\n\n14² is 196.\n\nSo first radical: √(196 + 165) = √361 = 19.\n\nSecond radical: √(196 - 52) = √144 = 12.\n\nSo 19 - 12 = 7, which is exactly the right-hand side. So yes, it checks out.\n\nTherefore, the only solution is x = 14. Since the problem says x is positive, we don't have to consider negative roots. So I think that's the answer.\n</think>To solve the equation \\(\\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\\) for positive \\(x\\), we proceed as follows:\n\n1. Start with the given equation:\n   \\[\n   \\sqrt{x^2 + 165} - \\sqrt{x^2 - 52} = 7\n   \\]\n\n2. Isolate one of the square roots by moving \\(\\sqrt{x^2 - 52}\\) to the right side:\n   \\[\n   \\sqrt{x^2 + 165} = 7 + \\sqrt{x^2 - 52}\n   \\]\n\n3. Square both sides to eliminate the square root on the left:\n   \\[\n   (\\sqrt{x^2 + 165})^2 = (7 + \\sqrt{x^2 - 52})^2\n   \\]\n   Simplifying both sides, we get:\n   \\[\n   x^2 + 165 = 49 + 14\\sqrt{x^2 - 52} + (x^2 - 52)\n   \\]\n\n4. Combine like terms on the right side:\n   \\[\n   x^2 + 165 = x^2 - 52 + 49 + 14\\sqrt{x^2 - 52}\n   \\]\n   Simplifying further:\n   \\[\n   x^2 + 165 = x^2 - 3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n5. Subtract \\(x^2\\) from both sides:\n   \\[\n   165 = -3 + 14\\sqrt{x^2 - 52}\n   \\]\n\n6. Add 3 to both sides to isolate the term with the square root:\n   \\[\n   168 = 14\\sqrt{x^2 - 52}\n   \\]\n\n7. Divide both sides by 14:\n   \\[\n   12 = \\sqrt{x^2 - 52}\n   \\]\n\n8. Square both sides again to eliminate the square root:\n   \\[\n   12^2 = x^2 - 52\n   \\]\n   Simplifying:\n   \\[\n   144 = x^2 - 52\n   \\]\n\n9. Add 52 to both sides to solve for \\(x^2\\):\n   \\[\n   196 = x^2\n   \\]\n\n10. Take the positive square root (since \\(x\\) is positive):\n    \\[\n    x = \\sqrt{196} = 14\n    \\]\n\n11. Verify the solution by substituting \\(x = 14\\) back into the original equation:\n    \\[\n    \\sqrt{14^2 + 165} - \\sqrt{14^2 - 52} = \\sqrt{196 + 165} - \\sqrt{196 - 52} = \\sqrt{361} - \\sqrt{144} = 19 - 12 = 7\n    \\]\n    The solution checks out.\n\nThus, the only positive solution is:\n\\[\n\\boxed{14}\n\\]</s>"

Now let's see how long the dataset is:

[11]

[13]

Train the model

Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None.

[17]

Unsloth: Tokenizing ["text"] (num_proc=12):   0%|          | 0/19252 [00:00<?, ? examples/s]

[15]

GPU = NVIDIA L4. Max memory = 22.161 GB.
14.594 GB of memory reserved.

Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)

[18]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 19,252 | Num Epochs = 1 | Total steps = 15
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 2
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 2 x 1) = 4
 "-____-"     Trainable parameters = 184,811,520/12,893,967,360 (1.43% trained)

[19]

417.0752 seconds used for training.
6.95 minutes used for training.
Peak reserved memory = 18.139 GB.
Peak reserved memory for training = 3.545 GB.
Peak reserved memory % of max memory = 81.851 %.
Peak reserved memory for training % of max memory = 15.997 %.

Inference

Let's run the model via Unsloth native inference! According to the Mistral team, the recommended settings for reasoning inference are temperature = 0.7, top_p = 0.95

[20]

<think>
Okay, so I need to solve the equation (x + 2)^2 = 0. Hmm, let me think. The equation is saying that when I add x and 2 and then square the result, it equals zero. So, maybe I need to find x such that when I square something, it's zero. But squaring numbers usually gives a positive result unless the number itself is zero. So, if (x + 2)^2 = 0, then x + 2 must be zero, right? Because zero squared is zero. So, x + 2 = 0. Then, subtracting 2 from both sides, x = -2. That seems like the solution. Wait, but is that the only solution? Let me check. Since squaring a number can give the same result for a negative number as its positive counterpart, but in this case, the equation is set to zero, so the only number that squares to zero is zero itself. So, x + 2 must be zero. So, x = -2. That's it. So, the solution is x = -2. But just to be sure, maybe I should plug it back in. Let's see: if x = -2, then (x + 2)^2 = (-2 + 2)^2 = 0^2 = 0. Yes, that works. So, x = -2 is the solution. So, the answer is x = -2. But maybe the user wants the solution in a set notation? Like { -2 }? Or just the number? I think usually, equations like this are solved by giving the solution. So, x = -2. Okay, so I'll write that as the answer. But wait, the problem says "Solve (x + 2)^2 = 0". So, the solution is x = -2. So, I can write that. But maybe the user wants the solution in a box or something? I think just the answer is fine. So, the answer is x = -2. Wait, but maybe I should make sure I didn't miss anything. Let me double-check. The equation is (x + 2)^2 = 0. So, (x + 2)^2 = 0 implies x + 2 = 0. Then, x = -2. That's correct. So, the solution is x = -2. So, I can put that as the answer. But maybe the user wants me to show my steps? Well, in the thinking process, I already did that. So, the summary should reflect that. So, my summary would be that I took the equation (x + 2)^2 = 0, then took the square root of both sides, which gives x + 2 = 0, and then solving for x gives x = -2. That's it. So, the summary is just that. So, here's how I'll write it:

Summary: The equation (x + 2)^2 = 0 is solved by taking the square root of both sides, which gives x + 2 = 0. Solving for x yields x = -2. Therefore, the solution is x = -2.
</think>To solve the equation \((x + 2)^2 = 0\), we follow these steps:

1. Start with the equation:
   \[
   (x + 2)^2 = 0
   \]

2. Since squaring a number gives a non-negative result, the only way \((x + 2)^2\) can equal zero is if the expression inside the square is zero. Therefore, set the expression equal to zero:
   \[
   x + 2 = 0
   \]

3. Solve for \(x\) by subtracting 2 from both sides:
   \[
   x = -2
   \]

4. Verify the solution by substituting \(x = -2\) back into the original equation:
   \[
   (-2 + 2)^2 = 0^2 = 0
   \]
   This confirms that the solution is correct.

Therefore, the solution to the equation \((x + 2)^2 = 0\) is:
\[
x = \boxed{-2}
\]</s>

Saving, loading finetuned models

To save the final model as LoRA adapters, either use Hugging Face's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

[21]

('lora_model/tokenizer_config.json',
, 'lora_model/special_tokens_map.json',
, 'lora_model/chat_template.jinja',
, 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set False to True:

[22]

Saving to float16 for VLLM

We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See our docs for more deployment options.

[ ]

GGUF / llama.cpp Conversion

To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.

Some supported quant methods (full list on our docs page):

q8_0 - Fast conversion. High resource use, but generally acceptable.
q4_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
q5_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[NEW] To finetune and auto export to Ollama, try our Ollama notebook

[24]

Now, use the magistral_finetune.Q8_0.gguf file or magistral_finetune.Q4_K_M.gguf file in llama.cpp.

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️

This notebook and all Unsloth notebooks are licensed LGPL-3.0.