Unsloth Gpt Oss (20B) Fine Tuning

Gpt Oss (20B) Fine Tuning

unsloth-notebooksunslothoriginal_template

alph-notebooks/unsloth-notebooks / gpt-oss-(20B)-Fine-tuning.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

News

Placeholder

Installation

[ ]

Unsloth

We're about to demonstrate the power of the new OpenAI GPT-OSS 20B model through a finetuning example. To use our MXFP4 inference example, use this notebook instead.

[2]

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.5: Fast Gpt_Oss patching. Transformers: 4.56.0.dev0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gpt_oss won't work! Using float32.

model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/4.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/3.37G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.16G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/165 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/27.9M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

We now add LoRA adapters for parameter efficient finetuning - this allows us to only efficiently train 1% of all parameters.

[3]

Unsloth: Making `model.base_model.model.model` require gradients

Reasoning Effort

The gpt-oss models from OpenAI include a feature that allows users to adjust the model's "reasoning effort." This gives you control over the trade-off between the model's performance and its response speed (latency) which by the amount of token the model will use to think.

The gpt-oss models offer three distinct levels of reasoning effort you can choose from:

Low: Optimized for tasks that need very fast responses and don't require complex, multi-step reasoning.
Medium: A balance between performance and speed.
High: Provides the strongest reasoning performance for tasks that require it, though this results in higher latency.

[4]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: low

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>Equation: x^5 + 3x^4 - 10 = 3. So x^5 + 3x^4 - 13 =0. Solve for real roots? maybe numeric. Let's try approximate.

We can test integer roots: try x=1 => 1+3

Changing the reasoning_effort to medium will make the model think longer. We have to increase the max_new_tokens to occupy the amount of the generated tokens but it will give better and more correct answer

[5]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>The user: "Solve x^5 + 3x^4 - 10 = 3." Wait maybe it's an equation: x^5 + 3x^4 - 10 = 3. The variable x unknown. Solve for x. We need to solve the equation:

x^

Lastly we will test it using reasoning_effort to high

[6]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation: x^5 + 3x^4 - 10 = 3. Or maybe it's x^5 + 3x^4 - 10 = 3? That seems like a polynomial equation: x^5 + 3x^4 - 10

Data Prep

The HuggingFaceH4/Multilingual-Thinking dataset will be utilized as our example. This dataset, available on Hugging Face, contains reasoning chain-of-thought examples derived from user questions that have been translated from English into four other languages. It is also the same dataset referenced in OpenAI's cookbook for fine-tuning. The purpose of using this dataset is to enable the model to learn and develop reasoning capabilities in these four distinct languages.

[7]

README.md: 0.00B [00:00, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/5.29M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Dataset({
,    features: ['reasoning_language', 'developer', 'user', 'analysis', 'final', 'messages'],
,    num_rows: 1000
,})

To format our dataset, we will apply our version of the GPT OSS prompt

[8]

Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

Let's take a look at the dataset, and check what the 1st example shows

[9]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are an AI chatbot with a lively and energetic personality.<|end|><|start|>user<|message|>Can you show me the latest trends on Twitter right now?<|end|><|start|>assistant<|channel|>analysis<|message|>D'accord, l'utilisateur demande les tendances Twitter les plus récentes. Tout d'abord, je dois vérifier si j'ai accès à des données en temps réel. Étant donné que je ne peux pas naviguer sur Internet ou accéder directement à l'API de Twitter, je ne peux pas fournir des tendances en direct. Cependant, je peux donner quelques conseils généraux sur la façon de les trouver.

Je devrais préciser que les tendances Twitter évoluent rapidement et sont spécifiques à chaque région. Je pourrais suggérer de consulter la section « En vogue » sur l'application ou le site web. Aussi, l'utilisation de hashtags et le suivi d'utilisateurs pertinents pourraient être utiles. Il est important de souligner que les tendances varient selon la région et l'heure de la journée. Je devrais garder un ton amical et bienveillant, peut-être ajouter un emoji pour rester léger. Je vais structurer ma réponse étape par étape pour faciliter la lecture. Je dois m'excuser de ne pas pouvoir fournir des données en temps réel et proposer d'autres méthodes. Je conserverai un langage simple et convivial, en évitant les termes techniques.<|end|><|start|>assistant<|channel|>final<|message|>Hey there!  While I can't check Twitter (X) in real-time or access live data, I can share some tips to help you spot the latest trends:

1. **Open the "Trending" tab** on the Twitter app or website – it updates constantly!  
2. **Search for hashtags** like #Trending or #Viral to see what’s blowing up.  
3. **Follow accounts** that curate trends (e.g., @TrendingNow, @ViralThreads).  
4. **Check regional trends** – they often differ by location!  

Remember, trends are *super fast-moving* and often tied to pop culture, memes, or breaking news. For example, recent trends have included viral challenges (like the "Distracted Boyfriend" meme revival), celebrity drama, or unexpected events (hello, weather disasters!).  

Want me to brainstorm *what* might trend next? I’ve got ideas!<|return|>

What is unique about GPT-OSS is that it uses OpenAI Harmony format which support conversation structures, reasoning output, and tool calling.

Train the model

Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None.

[10]

Unsloth: Switching to float32 training since model cannot work with float16

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/1000 [00:00<?, ? examples/s]

We also use Unsloth's train_on_completions method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes and lower loss as well!

[ ]

Let's verify masking the instruction part is done! Let's print the 100th row again.

[ ]

Now let's print the masked out example - you should see only the answer is present:

[ ]

GPU = Tesla T4. Max memory = 14.741 GB.
12.811 GB of memory reserved.

Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)

[12]

Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
The tokenizer has new special tokens that are also defined in the model configs. The model configs were aligned accordingly. Updated tokens: {'bos_token_id': 199998, 'pad_token_id': 200017}
==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 1,000 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \    Batch size per device = 1 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
 "-____-"     Trainable parameters = 3,981,312 of 20,918,738,496 (0.02% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.

Unsloth: Will smartly offload gradients to save VRAM!

[13]

645.6936 seconds used for training.
10.76 minutes used for training.
Peak reserved memory = 12.975 GB.
Peak reserved memory for training = 0.164 GB.
Peak reserved memory % of max memory = 88.02 %.
Peak reserved memory for training % of max memory = 1.113 %.

Inference

Let's run the model! You can change the instruction and input - leave the output blank!

[14]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are a helpful assistant that can solve mathematical problems.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>The equation is \(x^5 + 3x^4 - 10 = 3\), or \(x^5 + 3x^4 - 13 = 0\). So we need to find the roots of \(x^5 + 3x^4 - 13

Saving, loading finetuned models

To save the final model as LoRA adapters, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] Currently finetunes can only be loaded via Unsloth in the meantime - we're working on vLLM and GGUF exporting!

[16]

To run the finetuned model, you can do the below after setting if False to if True in a new instance.

[17]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions

reasoning language: French

You are a helpful assistant that can solve mathematical problems.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation for x. The equation: x^5 + 3x^4 - 10 = 3. So bring 3 to left side: x^5 + 3x^4 -10 -3 = 0 → x^5 + 3x^

Saving to float16 for VLLM or mxfp4

We also support saving to float16 or mxfp4 directly. Select merged_16bit for float16. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

[ ]

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️