News
Placeholder
Installation
Unsloth
π¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning. π¦₯ Unsloth Zoo will now patch everything to make training faster!
config.json: 0%| | 0.00/687 [00:00<?, ?B/s]
Unsloth: Using fast encoder path for xlm-roberta (torch.compile + SDPA)
modules.json: 0%| | 0.00/349 [00:00<?, ?B/s]
config_sentence_transformers.json: 0%| | 0.00/123 [00:00<?, ?B/s]
README.md: 0.00B [00:00, ?B/s]
sentence_bert_config.json: 0%| | 0.00/54.0 [00:00<?, ?B/s]
`torch_dtype` is deprecated! Use `dtype` instead!
pytorch_model.bin: 0%| | 0.00/2.27G [00:00<?, ?B/s]
model.safetensors: 0%| | 0.00/2.27G [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/444 [00:00<?, ?B/s]
sentencepiece.bpe.model: 0%| | 0.00/5.07M [00:00<?, ?B/s]
tokenizer.json: 0%| | 0.00/17.1M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/964 [00:00<?, ?B/s]
config.json: 0%| | 0.00/191 [00:00<?, ?B/s]
We now add LoRA adapters so we only need to update a small amount of parameters!
Unsloth: Enabled gradient checkpointing Unsloth: torch.compile will be applied automatically if max_steps > 193
README.md: 0%| | 0.00/512 [00:00<?, ?B/s]
dataset.jsonl: 0.00B [00:00, ?B/s]
Generating train split: 0%| | 0/106628 [00:00<?, ? examples/s]
Let's take a look at the dataset structure:
Dataset examples:
{'anchor': '.308', 'positive': 'The .308 Winchester is a popular rifle cartridge used for hunting and target shooting.'}
{'anchor': '.308', 'positive': 'Many precision rifles are chambered in .308 for its excellent long-range accuracy.'}
{'anchor': '.308', 'positive': 'The sniper selected a .308 caliber round for the mission.'}
{'anchor': '.338 lapua', 'positive': 'The .338 Lapua Magnum is a high-powered cartridge designed for extreme long-range shooting.'}
{'anchor': '.338 lapua', 'positive': 'Military snipers often use .338 Lapua for engagements beyond 1000 meters.'}
{'anchor': '.338 lapua', 'positive': 'The rifle was chambered in .338 Lapua for maximum effective range.'}
Baseline Inference
Let's test the base model before fine-tuning to see how it performs on our specific domain.
--- Pre-Training Results for query: 'apexification' --- 0.4312 | apples are a tasty treat 0.4160 | induces root tip closure in non-vital teeth 0.3992 | a brick left by Yuki 0.3835 | a type of cancer treatment that uses drugs to boost the body's immune response 0.3643 | a plant hormone for regulating stress responses 0.3257 | the weed whacker uses an engine that runs on a mixture of gas and oil
Computing widget examples: 0%| | 0/1 [00:00<?, ?example/s]
GPU = Tesla T4. Max memory = 14.741 GB. 1.152 GB of memory reserved.
Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)
1220.381 seconds used for training. 20.34 minutes used for training. Peak reserved memory = 2.852 GB. Peak reserved memory for training = 1.7 GB. Peak reserved memory % of max memory = 19.347 %. Peak reserved memory for training % of max memory = 11.532 %.
--- Post-Training Results for query: 'apexification' --- 0.3403 | induces root tip closure in non-vital teeth 0.1799 | a type of cancer treatment that uses drugs to boost the body's immune response 0.0981 | a plant hormone for regulating stress responses 0.0915 | a brick left by Yuki 0.0769 | apples are a tasty treat -0.0184 | the weed whacker uses an engine that runs on a mixture of gas and oil
Now if you want to load the LoRA adapters we just saved for inference, set False to True:
Saving to float16 for VLLM
We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.
GGUF / llama.cpp Conversion
To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.
Some supported quant methods (full list on our Wiki page):
q8_0- Fast conversion. High resource use, but generally acceptable.q4_k_m- Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.q5_k_m- Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.