Unsloth Granite4.0 350M

Granite4.0 350M

unsloth-notebooksunslothoriginal_template

alph-notebooks/unsloth-notebooks / Granite4.0_350M.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

News

Placeholder

Installation

[ ]

Unsloth

[3]

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.10.11: Fast Granitemoehybrid patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.8.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.4.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.32.post2. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.

model.safetensors:   0%|          | 0.00/370M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/170 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/24.2k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.61M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/917k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/7.15M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/6.42k [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update a small amount of parameters!

[4]

Unsloth: Making `model.base_model.model.model` require gradients

Data Prep

📄 Using Google Sheets as Training Data

Our goal is to create a customer support bot that proactively helps and solves issues.

We’re storing examples in a Google Sheet with two columns:

Snippet: A short customer support interaction
Recommendation: A suggestion for how the agent should respond

This keeps things simple and collaborative. Anyone can edit the sheet, no database setup required.

🔍 Why This Format?

This setup works well for tasks like:

Input snippet → Suggested reply
Prompt → Rewrite
Bug report → Diagnosis
Text → Label or Category

Just collect examples in a spreadsheet, and you’ve got usable training data.

✅ What You'll Learn

We’ll show how to:

Load the Google Sheet into your notebook
Format it into a dataset
Use it to train or prompt an LLM

The chat template for granite-4 look like this:

<|start_of_role|>system<|end_of_role|>Knowledge Cutoff Date: April 2024.
Today's Date: June 24, 2025.
You are Granite, developed by IBM. You are a helpful AI assistant.<|end_of_text|>

<|start_of_role|>user<|end_of_role|>How do astronomers determine the original wavelength of light emitted by a celestial body at rest, which is necessary for measuring its speed using the Doppler effect?<|end_of_text|>

<|start_of_role|>assistant<|end_of_role|>Astronomers make use of the unique spectral fingerprints of elements found in stars...<|end_of_text|>

[5]

Downloading data:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

Generating train split: 0 examples [00:00, ? examples/s]

We've just loaded the Google Sheet as a csv style Dataset, but we still need to format it into conversational style like below and then apply the chat template.

{"role": "system", "content": "You are an assistant"}
{"role": "user", "content": "What is 2+2?"}
{"role": "assistant", "content": "It's 4."}

We'll use a helper function formatting_prompts_func to do both!

[6]

Map:   0%|          | 0/504 [00:00<?, ? examples/s]

We now look at the raw input data before formatting.

[7]

'User: I\'m getting an error when trying to log in. \nAgent: What error message are you seeing? \nUser: It says "Invalid credentials" even though I\'m sure my password is correct. \nAgent: Have you tried clearing your browser cache? \nUser: Yes, I cleared it already. \nAgent: Let me check your account status. \nUser: I\'ve been using this account for months without issues. \nAgent: I found no issues with your account. \nUser: Maybe there\'s a problem with the login server? \nAgent: Let\'s try resetting your password. \nUser: I just did that, and it\'s not working either. \nAgent: I\'ll need to escalate this to our engineering team. \nUser: Okay, what should I do in the meantime? \nAgent: Try using a different browser or device. \nUser: I\'ll try Chrome on my laptop. \nAgent: Let me know if that resolves the issue.'

[8]

'#### Analysis\nThe user is experiencing persistent login issues ("Invalid credentials", password reset failure) despite clearing cache and confirming correct credentials. No account anomalies were detected by the agent. The root cause remains unresolved and potentially related to server-side authentication or user-specific credential handling.\n\n#### Recommendation\n- Step 1: Confirm if using Chrome on the laptop resolved the login issue. (User action)\n- Step 2: If Step 1 was successful, no further immediate action needed. If not, proceed to escalate based on user feedback.\n- *Next best action for the agent*: Report back to the user whether using Chrome on their laptop confirmed or failed to resolve the issue.'

And we see how the chat template transformed these conversations.

[9]

'<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>User: I\'m getting an error when trying to log in. \nAgent: What error message are you seeing? \nUser: It says "Invalid credentials" even though I\'m sure my password is correct. \nAgent: Have you tried clearing your browser cache? \nUser: Yes, I cleared it already. \nAgent: Let me check your account status. \nUser: I\'ve been using this account for months without issues. \nAgent: I found no issues with your account. \nUser: Maybe there\'s a problem with the login server? \nAgent: Let\'s try resetting your password. \nUser: I just did that, and it\'s not working either. \nAgent: I\'ll need to escalate this to our engineering team. \nUser: Okay, what should I do in the meantime? \nAgent: Try using a different browser or device. \nUser: I\'ll try Chrome on my laptop. \nAgent: Let me know if that resolves the issue.<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>#### Analysis\nThe user is experiencing persistent login issues ("Invalid credentials", password reset failure) despite clearing cache and confirming correct credentials. No account anomalies were detected by the agent. The root cause remains unresolved and potentially related to server-side authentication or user-specific credential handling.\n\n#### Recommendation\n- Step 1: Confirm if using Chrome on the laptop resolved the login issue. (User action)\n- Step 2: If Step 1 was successful, no further immediate action needed. If not, proceed to escalate based on user feedback.\n- *Next best action for the agent*: Report back to the user whether using Chrome on their laptop confirmed or failed to resolve the issue.<|end_of_text|>\n'

Train the model

Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None.

[10]

Unsloth: We found double BOS tokens - we shall remove one automatically.

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/504 [00:00<?, ? examples/s]

We also use Unsloth's train_on_completions method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes!

[11]

Map (num_proc=6):   0%|          | 0/504 [00:00<?, ? examples/s]

Let's verify masking the instruction part is done! Let's print the 100th row again.

[12]

'<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>\n<|start_of_role|>user<|end_of_role|>User: My account is locked. I tried to log in but got an error message saying "Too many failed attempts".\n\nAgent: Can you please try logging in again and enter the security code sent to your email? That should unlock your account temporarily.\n\nUser: I did that already. I received the code and entered it, but my account is still locked.\n\nAgent: I see. Have you tried resetting your password via the \'Forgot Password\' link?\n\nUser: Yes, I clicked on that. It sent me an email with a reset link, but when I tried to reset my password, I got an error message saying "Invalid request".\n\nAgent: Okay, I can\'t access your account to check directly. Could you please provide me with your account ID or email address associated with the account?\n\nUser: My email is user@example.com. Account ID is 123456789.\n\nAgent: Thank you. I\'m looking into this. It seems there might be an issue with the account lockout mechanism or the password reset process. I\'ll need to contact our security team to manually unlock your account and investigate further. I\'ll keep you updated on the progress.<|end_of_text|>\n<|start_of_role|>assistant<|end_of_role|>#### Analysis\nThe user is experiencing account lockout due to multiple failed login attempts, and standard troubleshooting steps like password reset and security code entry are failing, indicating a potential issue with the account lockout mechanism or password recovery system.\n\n#### Recommendation\n- Step 1: Attempt to reset the password using the \'Forgot Password\' link and provide the error details received.\n- Step 2: Contact support with the account ID/email and request manual account unlock and investigation.\n- *Next best action for the agent*: Instruct the user to contact support immediately, providing their account details for manual intervention and further investigation.<|end_of_text|>\n'

Now let's print the masked out example - you should see only the answer is present:

[13]

"                                                                                                                                                                                                                                                                       #### Analysis\nThe user is experiencing account lockout due to multiple failed login attempts, and standard troubleshooting steps like password reset and security code entry are failing, indicating a potential issue with the account lockout mechanism or password recovery system.\n\n#### Recommendation\n- Step 1: Attempt to reset the password using the 'Forgot Password' link and provide the error details received.\n- Step 2: Contact support with the account ID/email and request manual account unlock and investigation.\n- *Next best action for the agent*: Instruct the user to contact support immediately, providing their account details for manual intervention and further investigation.<|end_of_text|>\n"

[14]

GPU = Tesla T4. Max memory = 14.741 GB.
0.453 GB of memory reserved.

Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)

Notice you might have to wait ~10 minutes for the Mamba kernels to compile! Please be patient!

[15]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 504 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 13,303,808 of 365,683,712 (3.64% trained)

Unsloth: Will smartly offload gradients to save VRAM!

[16]

212.4166 seconds used for training.
3.54 minutes used for training.
Peak reserved memory = 3.686 GB.
Peak reserved memory for training = 3.233 GB.
Peak reserved memory % of max memory = 25.005 %.
Peak reserved memory for training % of max memory = 21.932 %.

Inference

Let's run the model via Unsloth native inference! We'll use some example snippets not contained in our training data to get a sense of what was learned.

[17]

[18]

<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
User: Everyone in my meeting just sees a black screen when I share.
Agent: Sorry about that—are you sharing a window or your entire screen?
User: Entire screen on macOS Sonoma.
Agent: Thanks. Do you have “Enable hardware acceleration” toggled on in Settings → Video?
User: Yeah, that switch is on.
Agent: Could you try toggling it off and start a quick test share?
User: Did that—still black for attendees.
Agent: Understood. Are you on the desktop app v5.4.2 or the browser client?
User: Desktop v5.4.2—just updated this morning.
<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>#### Analysis
The user is experiencing a black screen despite sharing the entire screen on macOS Sonoma. The user has enabled hardware acceleration and has attempted to toggle it off and start a quick test share. The issue persists across different desktop apps and browsers, suggesting a potential problem with the specific application or environment.

#### Recommendation
- Step 1: Confirm the user is using macOS Sonoma.
- Step 2: Verify the user's desktop application is v5.4.2 and the browser client is installed.
- *Next best action for the agent*: Ask the user to open a new tab in the browser and try sharing again to determine if the issue is specific to the browser or application.<|end_of_text|>

[19]

<|start_of_role|>system<|end_of_role|>You are a helpful assistant. Please ensure responses are professional, accurate, and safe.<|end_of_text|>
<|start_of_role|>user<|end_of_role|>
User: I changed the batteries, but the lock app still says 5 % and won’t auto-lock.
Agent: Let’s check firmware. In the app, go to Settings → Device Info—what version shows?
User: 3.18.0-alpha.
Agent: Latest stable is 3.17.5. Did you enroll in the beta program?
User: I might have months ago.
Agent: Beta builds sometimes misreport battery. Remove one battery, wait ten seconds, reinsert, and watch the LED pattern.
User: LED blinks blue twice, then red once.
Agent: That blink code means “config mismatch.” Do you still have the old batteries handy?
User: Tossed them already.
<|end_of_text|>
<|start_of_role|>assistant<|end_of_role|>#### Analysis
The user reports a persistent 5% auto-lock issue despite changing batteries and checking firmware. The LED pattern indicates a potential config mismatch, suggesting the issue may be related to battery or firmware updates.

#### Recommendation
- Step 1: Confirm the user's battery status and confirm the latest stable version (3.17.5) is being used.
- Step 2: Ask the user to remove one battery and wait 10 seconds before reinserting.
- *Next best action for the agent*: If the issue persists, suggest the user try a different battery or check for any recent firmware updates.<|end_of_text|>

Saving, loading finetuned models

To save the final model as LoRA adapters, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

[20]

('lora_model/tokenizer_config.json',
, 'lora_model/special_tokens_map.json',
, 'lora_model/chat_template.jinja',
, 'lora_model/vocab.json',
, 'lora_model/merges.txt',
, 'lora_model/added_tokens.json',
, 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set False to True:

[21]

Saving to float16 for VLLM

We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

[22]

GGUF / llama.cpp Conversion

To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.

Some supported quant methods (full list on our Wiki page):

q8_0 - Fast conversion. High resource use, but generally acceptable.
q4_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
q5_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[NEW] To finetune and auto export to Ollama, try our Ollama notebook

Likewise, if you want to instead push to GGUF to your Hugging Face account, set if False to if True and add your Hugging Face token and upload location!

[23]

Now, use the model-unsloth.gguf file or model-unsloth-Q4_K_M.gguf file in llama.cpp.

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️

This notebook and all Unsloth notebooks are licensed LGPL-3.0.