Unsloth Bert Classification

Bert Classification

unsloth-notebooksunslothoriginal_template

alph-notebooks/unsloth-notebooks / bert_classification.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

News

Placeholder

Installation

[ ]

Unsloth

FastModel supports loading nearly any model now! This includes Vision and Text models!

[1]

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
env: UNSLOTH_DISABLE_FAST_GENERATION=1
==((====))==  Unsloth 2025.8.7: Fast Modernbert patching. Transformers: 4.55.1.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: Float16 full finetuning uses more memory since we upcast weights to float32.

Some weights of ModernBertForSequenceClassification were not initialized from the model checkpoint at answerdotai/ModernBERT-large and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

We now add LoRA adapters so we only need to update a small amount of parameters!

[2]

Unsloth: Full finetuning is enabled, so .get_peft_model has no effect

Data Prep

We now use the Emotion dataset from dair-ai, which contains text labeled by emotion. In this example, we load the unsplit version and use only the first 30,000 samples.

We then split the dataset into training (80%) and validation (20%), and apply tokenization to prepare the text for training.

[3]

Map:   0%|          | 0/24000 [00:00<?, ? examples/s]

Map:   0%|          | 0/6000 [00:00<?, ? examples/s]

We compute class weights using scikit-learn’s compute_class_weight.
This is useful when training on datasets where certain classes are underrepresented, ensuring the model does not become biased towards majority labels.

[4]

[5]

[6]

array([0.56116723, 0.49943813, 1.998002  , 1.23839009, 1.45932142,
,       4.49438202])

We define a compute_metrics function to evaluate the model during training.
Here we use accuracy from scikit-learn, which compares predicted labels with the ground truth.

[NOTE] Accuracy is a good baseline, but for imbalanced datasets you may also want to track metrics like F1-score, precision, or recall.

[7]

Train the model

Now let's use Huggingface Trainer! More docs here: Transformers docs. We train for one full epoch (num_train_epochs=1) to get a meaningful result.

[8]

Let's train the model! To resume a training run, set trainer.train(resume_from_checkpoint = True)

[9]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 24,000 | Num Epochs = 1 | Total steps = 750
O^O/ \_/ \    Batch size per device = 32 | Gradient accumulation steps = 1
\        /    Data Parallel GPUs = 1 | Total batch size (32 x 1 x 1) = 32
 "-____-"     Trainable parameters = 395,837,446 of 395,837,446 (100.00% trained)

Unsloth: Will smartly offload gradients to save VRAM!

Inference

Let's run the model !

[45]

Device set to use cuda:0

[{'label': 'joy', 'score': 0.7757943272590637}]

Saving finetuned models

To save the final model, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.

[15]

('model/tokenizer_config.json',
, 'model/special_tokens_map.json',
, 'model/tokenizer.json')

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️