Unsloth Kaggle GPT OSS MXFP4 (20B) Inference

Kaggle GPT OSS MXFP4 (20B) Inference

unsloth-notebooksunslothnb

alph-notebooks/unsloth-notebooks / Kaggle-GPT_OSS_MXFP4_(20B)-Inference.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!

Join Discord if you need help + ⭐ Star us on Github ⭐

To install Unsloth on your own computer, follow the installation instructions on our Github page here.

You will learn how to do data prep, how to train, how to run the model, & how to save it

News

Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog

You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog

Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog

3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context

New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL

Visit our docs for all our model uploads and notebooks.

Installation

[1]

Unsloth

We're about to demonstrate the power of the new OpenAI GPT-OSS 20B model through an inference example. For our bnb-4bit version, use this notebook instead.

We're using OpenAI's MXFP4 Triton kernels combined with Unsloth's kernels!

[2]

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.8.5: Fast Gpt_Oss patching. Transformers: 4.56.0.dev0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu128. CUDA: 7.5. CUDA Toolkit: 12.8. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Using float16 precision for gpt_oss won't work! Using float32.
Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.

model.safetensors.index.json: 0.00B [00:00, ?B/s]

model-00000-of-00002.safetensors:   0%|          | 0.00/4.79G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.80G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/165 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/27.9M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/446 [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

Reasoning Effort

The gpt-oss models from OpenAI include a feature that allows users to adjust the model's "reasoning effort." This gives you control over the trade-off between the model's performance and its response speed (latency) which by the amount of token the model will use to think.

The gpt-oss models offer three distinct levels of reasoning effort you can choose from:

Low: Optimized for tasks that need very fast responses and don't require complex, multi-step reasoning.
Medium: A balance between performance and speed.
High: Provides the strongest reasoning performance for tasks that require it, though this results in higher latency.

[3]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: low

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>Equation: x^5 +3x^4 -10 =3 => x^5 +3x^4 -13 =0. Need roots. Could factor numeric? Try integer roots: ±1, ±13. f(1)=1+3-13=-9. f( -1)= -1+3-13=-1? wait: f(-1)=(-1)+3-13=-12. f(13)=371293+3*28561-13 huge positive. f(-13)= -371293+3*28561-13 = -371293+85683-13= -285724. Maybe rational ±13? none. So only real root? Let's find approximate. Use numerical. Let g(x)=x^5+3x^4-13. At x=1: -9. At x=2:32+ 3*16=48 =>80-13=67 positive. So one root between1 and2. Try1.2: 1.2^5= (1.2^2=1.44, *1.2^3? 1.2^3=1.728, *1.2^5 =1.728*1.44= asym? Let's compute: 1.2^4=2.0736, times 1.2 =2.494. So x^5=2.49. 3x^4=3*2.0736=6.2208 sum=8.71-13=-4.29. So still negative. Try1.4: 1.4^2=1.96, ^3=2.752, ^4=384? let's compute: 1.4^4= 1.96*1.96=768? Wait 1.96*1.96=767? compute: 1.96*1.96= 768? Actually 1.96*1.96= 1.926? let me do precisely: 2*2=4 minus small. ok quick approximate: 1.4^4= (1.96)^2=768? 1.96^2= 767.? 1.96^2= 1.96*1.96= 610? I'm messing. Let's use calculator mentally: 1.96*1.96 = (2-0.04)^2=4-0

Changing the reasoning_effort to medium will make the model think longer. We have to increase the max_new_tokens to occupy the amount of the generated tokens but it will give better and more correct answer

[4]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: medium

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>We need to solve equation: x^5 + 3x^4 - 10 = 3. So rewrite: x^5 + 3x^4 - 10 = 3 => x^5 + 3x^4 - 13 = 0. So solve polynomial equation x^5 + 3x^4 - 13 = 0. We likely seek real solutions. It's quintic, so analytic roots may be messy. But maybe there's an integer root? We can test integer divisors of 13: ±1,±13. Plug x=1: 1+3-13=-9 not zero. x= -1: -1+3-13=-10. x=13 huge. x = -13: -371293 + 121663 -13 huge negative. So no integer root. Maybe rational? ±13 maybe not. So probably we can factor something? Let’s try substitution x^4(x+3)=13? Actually equation: x^5 + 3x^4 = 13 so x^4(x+3)=13 => x+3 = 13/x^4 => x = 13/x^4 - 3. Hard.

Maybe we find approximate numeric solution? Let's analyze function f(x)=x^5+3x^4-13. Evaluate some values.

f(0)=-13. f(1)= -9. f(2)=32 + 48 -13 = 40. So there's root between 1 and 2. f(1.2): 1.2^5= (1.2^2=1.44, 1.2^3=1.728, 1.2^4=2.0736, 1.2^5=2.494). 3*1.2^4=3*2.0736=6.2208. Sum=8.7148-13=-4.285. f(1.3): 1.3^5: 1.3^2=1.69, 1.3^3=2.1957, 1.3^4=2.85681, 1.3^5=587... let's compute 2.85681*1.3= anf: 2.876? Wait; 2.85681*1.3=738? Let's compute precisely: 2.85681*1.3 = 185? Actually 2.85681*1.3 = 39? Wait 2.85681*1=2.85681, times 0.3=0.857043, sum=3.713853. So 1.3^5=3.713853. 3*1.3^4 =3*2.85681=8.57043. Sum=12.38428-13=-0.6157. f(1.35): compute 1.35^2=1.8225, ^3=1.310..., 1.8225*1.35=779? No: 1.8225*1.35=552? Let's calculate: 1.8225*1.35 = 1.8225*1 + 1.8225*0.35 = 1.8225 + 717. So 1.8225*0.35=851? Actually 1.8225*0.35=758? Let's do precisely: 1.8225*35=758.4; divided by 100 =7.584. Wait mistake. Let's use high precision: 1.8225 * 0.35 = 1.602375. Add 1.8225 gives 3.424875. So 1.35^3=3.424875. ^4 multiply by 1.35: 3.424875*1.35 = 3.424875 + 1.01799375 =4.44286875. ^5 multiply by 1.35: 4.44286875*1.35 = 4.44286875 +1.3335860625 =5.7764 approx. So 1.35^5 =5.7764. 3*1.35^4=3*4.442869=13. anon: 13. 3*4.442869=13. pre? 4.442869*3=13. S? Actually 4.442869*3 = 13. ? Let's compute: 4.442869 *3 = 13.? 4.442869*2=8.885738, +4.442869=13.328607. So f(1.35)=5.7764+13.3286-13=+? Sum=5.7764+

Lastly we will test it using reasoning_effort to high

[5]

<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-08-13

Reasoning: high

# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>Solve x^5 + 3x^4 - 10 = 3.<|end|><|start|>assistant<|channel|>analysis<|message|>The problem: "Solve x^5 + 3x^4 - 10 = 3." It is an equation: x^5 + 3x^4 - 10 = 3. So bring everything: x^5 + 3x^4 - 13 = 0. We need to solve for x? Likely real solutions? Or approximate? Could be factorable? Let's explore algebraically.

Given x^5 + 3x^4 - 13 = 0.

We can attempt to factor by grouping? Not obviously. Maybe try rational root theorem: possible rational roots are factors of 13: ±1, ±13. Let's test:

x=1: 1 + 3 -13 = -9 ≠ 0
x=-1: -1 + 3 -13 = -12 ≠ 0
x=13: huge > 0.
x=-13: negative large. So no integer roots. Maybe something like x=2? Wait rational roots must divide constant 13, but polynomial is degree 5 with leading coefficient 1. But coefficients: 1,3,0,0,0,-13. So constant term -13, rational roots ±1, ±13. None work. So no simple rational root.

Thus the equation x^5 + 3x^4 - 13 = 0 likely has one real root only? Let's analyze derivative: 5x^4 +12x^3 = x^3 (5x + 12). So derivative zero at x=0 (triply), and at x = -12/5 ≈ -2.4. So function f(x)=x^5+3x^4-13.

We can evaluate f(-3) = (-243) + 81 -13 = -175? Wait compute: (-3)^5 = -243; 3*(-3)^4 = 3*81=243; So f(-3) = -243 + 243 -13 = -13. Actually 243-243=0, so f(-3) = -13. f(-4) = (-1024)+ 192 -13 = -, -1024+192=-, -1024+192 = - (1024-192)= - (1112?) Actually 1024-192=1024-200+8=744+8=752? Wait correct: 1024-192= (1000-192)+(24) is 1108? Let's compute precisely: 1024-192 = 1024-200+8 = 784+8 = 792? No, 1024-200=724, plus 8 = 732. Wait I think I'm messing. Let's do: 192*5=960; 1024-960= 108; plus 192/5=38.4? No. Let's compute more carefully: 1024-192: Subtract 192 from 1024: 1024-100=924; 924-92=832; So 1024-192= 1024-200+8 = 124? Wait no, let's do direct:

192*5= 200*5 - 8*5? I'm not computing. Let's compute 1024-192 by decomposition: 1024-200= 722; add 8 back to get 730. Yes, 1024-192= 1024-200+8 = 722+8 = 730. So -1024 + 192 = - (1024-192) = -(730) = -730. Then minus 13 gives -743. So f(-4) = -743, far negative.

Now f(-2): (-32) + 48 -13 = -245? Actually -32+48=16; 16-13=3. So f(-2) = 3.

f(-1): -1 + 3 -13 = -12. So at x=-2, f(-2)=3, at x=-1, f(-1)=-12. Therefore function crosses zero in between. Since f(-3)=-13, f(-2)=3, so one root between -3 and -2? Wait f(-3) = -13, f(-2)=3. So yes, one root around about -2. something. Also check f(0) = -13, negative. f(1) = 1+3-13=-9 negative. f(2) = 32+48-13=39? Actually 32+48=80; 80-13=67 > 0. Wait but f(1) was -9? Let's compute: x=1: 1^5 =1; 3*1^4 =3; 1+3-13=-9. So yes negative. f(2) = 32+48-13 = 39? 32+48=80, minus 13 =67? Wait 80-13=67. Yes. So sign changes between 1 and 2. But earlier f(0)=-13. f(1) negative. So one root between 1 and 2. But we found f(-2)=3 > 0, f(-1) = -12 negative. That implies a root between -2 and -1. But we also saw f(-3) = -13 negative, f(-2)=3 positive. So another root between -3 and -2. Wait f(-3) negative. So from negative at -3, passes positive at -2, then negative at -1, then negative at 0. So two sign changes in the negative region, meaning potentially two real roots there. But careful: The function is a quintic, so could have up to 5 real roots. But derivative: zeros at x=0 (multiplicity 3) and x=-12/5 ≈ -2.4. So there is a local max at x=-12/5? Let's analyze f' sign:

- For x<-12/5 ≈ -2.4: factor x^3(5x+12). For x ~ -3: x^3 is negative (-27), 5x+12 at x=-3: 5(-3)+12 = -15+12 = -3, negative. So product negative*negative = positive. So f' > 0 for x < -12/5? Wait: Let's compute: x^3 negative, (5x+12) negative for x < -12/5? x < -2.4, then 5x < -12, +12 negative. So negative. So negative*negative positive. So f' > 0 for x < -12/5, i.e., function increasing until x = -12/5. At x = -12/5, f' = 0. For -12/5 < x < 0: For x say -1: x^3 negative (-1), (5x+12) = -5+12=7 positive. So negative*positive negative => f' < 0, function decreasing. So at x=0, f' = 0 again (multiplicity 3). For x>0, f' > 0 (x^3 positive, 5x+12 positive). So function has a local maximum at x=-12/5, and a stationary plateau at 0 (inflection? Actually triple root of derivative). So global behavior: For large negative x, f(x) → -∞ (since leading term x^5 dominates negative). At x = -∞, negative. It increases to some value at x = -12/5, then decreasing until x=0, then increasing thereafter. So there can be at most 3 real roots? Let's check sign at f(-∞) negative. At f(-12/5), maybe positive? Let's compute f at x = -12/5 exactly: x = -12/5 = -2.4. Compute f(-2.4) = (-2.4)^5 + 3*(-2.4)^4 - 13.

Let's compute: (-2.4)^2 = 5.76. (-2.4)^3 = -13.824. (-2.4)^4 = 33.1584. (-2.4)^5 = -79.6269? Actually 33.1584 * -2.4 = -100.600... Wait compute carefully: 33.1584*2.4=79.6269. With negative: -79.6269. So f(-2.4) = (-79.6269) + 3*33.1584 -13. 3*33.1584 = 99.4752. Sum: -79.6269 + 99.4752 = 19.8483. minus 13 = 6.8483. So f(-2.4) ≈ 6.85. So f positive at critical point. At x=0: f(0) = -13 negative. So function crosses zero between -2.4 and 0 at least once (it does: we found root between -2 and -1). But there may be two crossings: positive to negative at -12/5 to 0. Let's note that f is increasing until -12/5 where positive. Then decreasing to a minimum at x=0 where negative. So there should be exactly one root in (-2.4, 0). However earlier we found also that f(-3) negative, f(-2) positive, then f(-1) negative. That would imply two roots between -3 and -2 and -2 and -1. But

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️

This notebook and all Unsloth notebooks are licensed LGPL-3.0.