Qwen3 (14B) Alpaca
News
Placeholder
Installation
Unsloth
π¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning. π¦₯ Unsloth Zoo will now patch everything to make training faster! ==((====))== Unsloth 2025.4.4: Fast Qwen3 patching. Transformers: 4.51.3. \\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0 \ / Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False] "-____-" Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
model.safetensors.index.json: 0%| | 0.00/168k [00:00<?, ?B/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
model-00001-of-00003.safetensors: 0%| | 0.00/4.97G [00:00<?, ?B/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
model-00002-of-00003.safetensors: 0%| | 0.00/4.59G [00:00<?, ?B/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
model-00003-of-00003.safetensors: 0%| | 0.00/1.56G [00:00<?, ?B/s]
Loading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]
generation_config.json: 0%| | 0.00/242 [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/10.3k [00:00<?, ?B/s]
vocab.json: 0%| | 0.00/2.78M [00:00<?, ?B/s]
merges.txt: 0%| | 0.00/1.67M [00:00<?, ?B/s]
added_tokens.json: 0%| | 0.00/707 [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/614 [00:00<?, ?B/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
tokenizer.json: 0%| | 0.00/11.4M [00:00<?, ?B/s]
chat_template.jinja: 0%| | 0.00/4.67k [00:00<?, ?B/s]
We now add LoRA adapters so we only need to update 1 to 10% of all parameters!
Unsloth 2025.4.4 patched 40 layers with 40 QKV layers, 40 O layers and 40 MLP layers.
Data Prep
We now use the Alpaca dataset from yahma, which is a filtered version of 52K of the original Alpaca dataset. You can replace this code section with your own data prep.
README.md: 0%| | 0.00/11.6k [00:00<?, ?B/s]
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet` WARNING:huggingface_hub.file_download:Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
alpaca_data_cleaned.json: 0%| | 0.00/44.3M [00:00<?, ?B/s]
Generating train split: 0%| | 0/51760 [00:00<?, ? examples/s]
Map: 0%| | 0/51760 [00:00<?, ? examples/s]
Train the model
Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None. We also support TRL's DPOTrainer!
The trainer includes our gradient accumulation bug fix. Read more about it here: Blog post
Unsloth: Tokenizing ["text"] (num_proc=2): 0%| | 0/51760 [00:00<?, ? examples/s]
GPU = Tesla T4. Max memory = 14.741 GB. 11.898 GB of memory reserved.
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 1 \\ /| Num examples = 51,760 | Num Epochs = 1 | Total steps = 60 O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 4 \ / Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8 "-____-" Trainable parameters = 64,225,280/14,000,000,000 (0.46% trained)
Unsloth: Will smartly offload gradients to save VRAM!
822.5598 seconds used for training. 13.71 minutes used for training. Peak reserved memory = 12.949 GB. Peak reserved memory for training = 1.051 GB. Peak reserved memory % of max memory = 87.843 %. Peak reserved memory for training % of max memory = 7.13 %.
Inference
Let's run the model! You can change the instruction and input - leave the output blank!
We use min_p = 0.1 and temperature = 1.5. Read this Tweet for more information on why.
['Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nContinue the fibonnaci sequence.\n\n### Input:\n1, 1, 2, 3, 5, 8\n\n### Response:\n13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6']
You can also use a TextStreamer for continuous inference - so you can see the generation token by token, instead of waiting the whole time!
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: Continue the fibonnaci sequence. ### Input: 1, 1, 2, 3, 5, 8 ### Response: 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811,
('lora_model/tokenizer_config.json',
, 'lora_model/special_tokens_map.json',
, 'lora_model/vocab.json',
, 'lora_model/merges.txt',
, 'lora_model/added_tokens.json',
, 'lora_model/tokenizer.json') Now if you want to load the LoRA adapters we just saved for inference, set False to True:
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: What is a famous tall tower in Paris? ### Input: ### Response: One of the most famous tall towers in Paris is the Eiffel Tower. It is a wrought-iron lattice tower located on the Champ de Mars in the 7th arrondissement of Paris, France. It was constructed as the entrance arch to the 1889 World's Fair, and it has since become an enduring symbol of Paris and France. The Eiffel Tower is 330 meters (1,083 ft) tall and is the tallest structure in Paris.<|im_end|>
You can also use Hugging Face's AutoModelForPeftCausalLM. Only use this if you do not have unsloth installed. It can be hopelessly slow, since 4bit model downloading is not supported, and Unsloth's inference is 2x faster.
Saving to float16 for VLLM
We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.
GGUF / llama.cpp Conversion
To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.
Some supported quant methods (full list on our Wiki page):
q8_0- Fast conversion. High resource use, but generally acceptable.q4_k_m- Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.q5_k_m- Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.
[NEW] To finetune and auto export to Ollama, try our Ollama notebook
Now, use the model-unsloth.gguf file or model-unsloth-Q4_K_M.gguf file in llama.cpp.
And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!
Some other resources:
- Train your own reasoning model - Llama GRPO notebook Free Colab
- Saving finetunes to Ollama. Free notebook
- Llama 3.2 Vision finetuning - Radiography use case. Free Colab
- See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!


