All MiniLM L6 V2
News
Placeholder
Installation
Unsloth
π¦₯ Unsloth: Will patch your computer to enable 2x faster free finetuning. π¦₯ Unsloth Zoo will now patch everything to make training faster!
config.json: 0%| | 0.00/612 [00:00<?, ?B/s]
Unsloth: Using fast encoder path for bert (torch.compile + SDPA)
modules.json: 0%| | 0.00/349 [00:00<?, ?B/s]
config_sentence_transformers.json: 0%| | 0.00/116 [00:00<?, ?B/s]
README.md: 0.00B [00:00, ?B/s]
sentence_bert_config.json: 0%| | 0.00/53.0 [00:00<?, ?B/s]
`torch_dtype` is deprecated! Use `dtype` instead!
model.safetensors: 0%| | 0.00/90.9M [00:00<?, ?B/s]
tokenizer_config.json: 0%| | 0.00/350 [00:00<?, ?B/s]
vocab.txt: 0.00B [00:00, ?B/s]
tokenizer.json: 0.00B [00:00, ?B/s]
special_tokens_map.json: 0%| | 0.00/112 [00:00<?, ?B/s]
config.json: 0%| | 0.00/190 [00:00<?, ?B/s]
We now add LoRA adapters so we only need to update a small amount of parameters!
Unsloth: torch.compile will be applied automatically if max_steps > 2616
README.md: 0.00B [00:00, ?B/s]
pair/train-00000-of-00001.parquet: 0%| | 0.00/26.2M [00:00<?, ?B/s]
pair/dev-00000-of-00001.parquet: 0%| | 0.00/645k [00:00<?, ?B/s]
pair/test-00000-of-00001.parquet: 0%| | 0.00/666k [00:00<?, ?B/s]
Generating train split: 0%| | 0/314315 [00:00<?, ? examples/s]
Generating dev split: 0%| | 0/6808 [00:00<?, ? examples/s]
Generating test split: 0%| | 0/6831 [00:00<?, ? examples/s]
Let's take a look at the dataset structure:
Dataset examples:
{'anchor': 'A person on a horse jumps over a broken down airplane.', 'positive': 'A person is outdoors, on a horse.'}
{'anchor': 'Children smiling and waving at camera', 'positive': 'There are children present'}
{'anchor': 'A boy is jumping on skateboard in the middle of a red bridge.', 'positive': 'The boy does a skateboarding trick.'}
{'anchor': 'Two blond women are hugging one another.', 'positive': 'There are women showing affection.'}
{'anchor': 'A few people in a restaurant setting, one of them is drinking orange juice.', 'positive': 'The diners are at a restaurant.'}
{'anchor': 'An older man is drinking orange juice at a restaurant.', 'positive': 'A man is drinking juice.'}
Baseline Inference
Let's test the base model before fine-tuning to see how it performs on our specific domain.
--- Pre-Training Results for query: 'A soccer player in a blue jersey is running across the field.' --- 0.4980 | The player is about to score a goal for his team. 0.4700 | An athlete is practicing a sport outdoors. 0.3125 | Jersey is a knit fabric used predominantly for clothing manufacture. 0.0739 | A person is sitting quietly on the grass.
Computing widget examples: 0%| | 0/1 [00:00<?, ?example/s]
GPU = Tesla T4. Max memory = 14.741 GB. 0.078 GB of memory reserved.
Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)
230.1043 seconds used for training. 3.84 minutes used for training. Peak reserved memory = 3.287 GB. Peak reserved memory for training = 3.209 GB. Peak reserved memory % of max memory = 22.298 %. Peak reserved memory for training % of max memory = 21.769 %.
--- Post-Training Results for query: 'A soccer player in a blue jersey is running across the field.' --- 0.4459 | An athlete is practicing a sport outdoors. 0.4322 | The player is about to score a goal for his team. 0.3113 | Jersey is a knit fabric used predominantly for clothing manufacture. 0.0944 | A person is sitting quietly on the grass.
Now if you want to load the LoRA adapters we just saved for inference, set False to True:
Saving to float16 for VLLM
We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.
GGUF / llama.cpp Conversion
To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.
Some supported quant methods (full list on our Wiki page):
q8_0- Fast conversion. High resource use, but generally acceptable.q4_k_m- Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.q5_k_m- Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.