Gpt Oss (20B) Fine Tuning
News
Placeholder
Installation
Unsloth
We're about to demonstrate the power of the new OpenAI GPT-OSS 20B model through a finetuning example. To use our MXFP4 inference example, use this notebook instead.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. 🦥 Unsloth Zoo will now patch everything to make training faster! ==((====))== Unsloth 2025.8.5: Fast Gpt_Oss patching. Transformers: 4.56.0.dev0. \\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.8.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.4.0 \ / Bfloat16 = FALSE. FA [Xformers = None. FA2 = False] "-____-" Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! Unsloth: Using float16 precision for gpt_oss won't work! Using float32.
model.safetensors.index.json: 0.00B [00:00, ?B/s]
model-00001-of-00004.safetensors: 0%| | 0.00/4.00G [00:00<?, ?B/s]
model-00002-of-00004.safetensors: 0%| | 0.00/4.00G [00:00<?, ?B/s]
model-00003-of-00004.safetensors: 0%| | 0.00/3.37G [00:00<?, ?B/s]
model-00004-of-00004.safetensors: 0%| | 0.00/1.16G [00:00<?, ?B/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
generation_config.json: 0%| | 0.00/165 [00:00<?, ?B/s]
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json: 0%| | 0.00/27.9M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/446 [00:00<?, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
We now add LoRA adapters for parameter efficient finetuning - this allows us to only efficiently train 1% of all parameters.
Unsloth: Making `model.base_model.model.model` require gradients
Reasoning Effort
The gpt-oss models from OpenAI include a feature that allows users to adjust the model's "reasoning effort." This gives you control over the trade-off between the model's performance and its response speed (latency) which by the amount of token the model will use to think.
The gpt-oss models offer three distinct levels of reasoning effort you can choose from:
- Low: Optimized for tasks that need very fast responses and don't require complex, multi-step reasoning.
- Medium: A balance between performance and speed.
- High: Provides the strongest reasoning performance for tasks that require it, though this results in higher latency.
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06 Current date: 2025-08-13 Reasoning: low # Valid channels: analysis, commentary, final. Channel must be included for every message. Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>Equation: x^5 + 3x^4 - 10 = 3. So x^5 + 3x^4 - 13 =0. Solve for real roots? maybe numeric. Let's try approximate. We can test integer roots: try x=1 => 1+3
Changing the reasoning_effort to medium will make the model think longer. We have to increase the max_new_tokens to occupy the amount of the generated tokens but it will give better and more correct answer
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06 Current date: 2025-08-13 Reasoning: medium # Valid channels: analysis, commentary, final. Channel must be included for every message. Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>The user: "Solve x^5 + 3x^4 - 10 = 3." Wait maybe it's an equation: x^5 + 3x^4 - 10 = 3. The variable x unknown. Solve for x. We need to solve the equation: x^
Lastly we will test it using reasoning_effort to high
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06 Current date: 2025-08-13 Reasoning: high # Valid channels: analysis, commentary, final. Channel must be included for every message. Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation: x^5 + 3x^4 - 10 = 3. Or maybe it's x^5 + 3x^4 - 10 = 3? That seems like a polynomial equation: x^5 + 3x^4 - 10
The HuggingFaceH4/Multilingual-Thinking dataset will be utilized as our example. This dataset, available on Hugging Face, contains reasoning chain-of-thought examples derived from user questions that have been translated from English into four other languages. It is also the same dataset referenced in OpenAI's cookbook for fine-tuning. The purpose of using this dataset is to enable the model to learn and develop reasoning capabilities in these four distinct languages.
README.md: 0.00B [00:00, ?B/s]
data/train-00000-of-00001.parquet: 0%| | 0.00/5.29M [00:00<?, ?B/s]
Generating train split: 0%| | 0/1000 [00:00<?, ? examples/s]
Dataset({
, features: ['reasoning_language', 'developer', 'user', 'analysis', 'final', 'messages'],
, num_rows: 1000
,}) To format our dataset, we will apply our version of the GPT OSS prompt
Map: 0%| | 0/1000 [00:00<?, ? examples/s]
Let's take a look at the dataset, and check what the 1st example shows
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06 Current date: 2025-08-13 Reasoning: medium # Valid channels: analysis, commentary, final. Channel must be included for every message. Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions reasoning language: French You are an AI chatbot with a lively and energetic personality.<|end|><|start|>user<|message|>Can you show me the latest trends on Twitter right now?<|end|><|start|>assistant<|channel|>analysis<|message|>D'accord, l'utilisateur demande les tendances Twitter les plus récentes. Tout d'abord, je dois vérifier si j'ai accès à des données en temps réel. Étant donné que je ne peux pas naviguer sur Internet ou accéder directement à l'API de Twitter, je ne peux pas fournir des tendances en direct. Cependant, je peux donner quelques conseils généraux sur la façon de les trouver. Je devrais préciser que les tendances Twitter évoluent rapidement et sont spécifiques à chaque région. Je pourrais suggérer de consulter la section « En vogue » sur l'application ou le site web. Aussi, l'utilisation de hashtags et le suivi d'utilisateurs pertinents pourraient être utiles. Il est important de souligner que les tendances varient selon la région et l'heure de la journée. Je devrais garder un ton amical et bienveillant, peut-être ajouter un emoji pour rester léger. Je vais structurer ma réponse étape par étape pour faciliter la lecture. Je dois m'excuser de ne pas pouvoir fournir des données en temps réel et proposer d'autres méthodes. Je conserverai un langage simple et convivial, en évitant les termes techniques.<|end|><|start|>assistant<|channel|>final<|message|>Hey there! While I can't check Twitter (X) in real-time or access live data, I can share some tips to help you spot the latest trends: 1. **Open the "Trending" tab** on the Twitter app or website – it updates constantly! 2. **Search for hashtags** like #Trending or #Viral to see what’s blowing up. 3. **Follow accounts** that curate trends (e.g., @TrendingNow, @ViralThreads). 4. **Check regional trends** – they often differ by location! Remember, trends are *super fast-moving* and often tied to pop culture, memes, or breaking news. For example, recent trends have included viral challenges (like the "Distracted Boyfriend" meme revival), celebrity drama, or unexpected events (hello, weather disasters!). Want me to brainstorm *what* might trend next? I’ve got ideas!<|return|>
What is unique about GPT-OSS is that it uses OpenAI Harmony format which support conversation structures, reasoning output, and tool calling.
Unsloth: Switching to float32 training since model cannot work with float16
Unsloth: Tokenizing ["text"] (num_proc=2): 0%| | 0/1000 [00:00<?, ? examples/s]
We also use Unsloth's train_on_completions method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes and lower loss as well!
Let's verify masking the instruction part is done! Let's print the 100th row again.
Now let's print the masked out example - you should see only the answer is present:
GPU = Tesla T4. Max memory = 14.741 GB. 12.811 GB of memory reserved.
Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)
Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.
The tokenizer has new special tokens that are also defined in the model configs. The model configs were aligned accordingly. Updated tokens: {'bos_token_id': 199998, 'pad_token_id': 200017}
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1
\\ /| Num examples = 1,000 | Num Epochs = 1 | Total steps = 30
O^O/ \_/ \ Batch size per device = 1 | Gradient accumulation steps = 4
\ / Data Parallel GPUs = 1 | Total batch size (1 x 4 x 1) = 4
"-____-" Trainable parameters = 3,981,312 of 20,918,738,496 (0.02% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
Unsloth: Will smartly offload gradients to save VRAM!
645.6936 seconds used for training. 10.76 minutes used for training. Peak reserved memory = 12.975 GB. Peak reserved memory for training = 0.164 GB. Peak reserved memory % of max memory = 88.02 %. Peak reserved memory for training % of max memory = 1.113 %.
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06 Current date: 2025-08-13 Reasoning: medium # Valid channels: analysis, commentary, final. Channel must be included for every message. Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions reasoning language: French You are a helpful assistant that can solve mathematical problems.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>The equation is \(x^5 + 3x^4 - 10 = 3\), or \(x^5 + 3x^4 - 13 = 0\). So we need to find the roots of \(x^5 + 3x^4 - 13
To run the finetuned model, you can do the below after setting if False to if True in a new instance.
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI. Knowledge cutoff: 2024-06 Current date: 2025-08-13 Reasoning: high # Valid channels: analysis, commentary, final. Channel must be included for every message. Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>developer<|message|># Instructions reasoning language: French You are a helpful assistant that can solve mathematical problems.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve the equation for x. The equation: x^5 + 3x^4 - 10 = 3. So bring 3 to left side: x^5 + 3x^4 -10 -3 = 0 → x^5 + 3x^
Saving to float16 for VLLM or mxfp4
We also support saving to float16 or mxfp4 directly. Select merged_16bit for float16. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.
And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!
Some other resources:
- Train your own reasoning model - Llama GRPO notebook Free Colab
- Saving finetunes to Ollama. Free notebook
- Llama 3.2 Vision finetuning - Radiography use case. Free Colab
- See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!


