Unsloth Liquid LFM2 (1.2B) Conversational

Liquid LFM2 (1.2B) Conversational

unsloth-notebooksunslothoriginal_template

alph-notebooks/unsloth-notebooks / Liquid_LFM2_(1.2B)-Conversational.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

News

Placeholder

Installation

[ ]

Unsloth

[3]

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.7.4: Fast Lfm2 patching. Transformers: 4.54.0.dev0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Unsloth: QLoRA and full finetuning all not selected. Switching to 16bit LoRA.

model.safetensors:   0%|          | 0.00/2.34G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/160 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/434 [00:00<?, ?B/s]

chat_template.jinja:   0%|          | 0.00/209 [00:00<?, ?B/s]

We now add LoRA adapters so we only need to update a small amount of parameters!

[4]

Unsloth: Making `model.base_model.model.model` require gradients

Data Prep

We now use the LFM format for conversation style finetunes. We use Maxime Labonne's FineTome-100k dataset in ShareGPT style. LFM renders multi turn conversations like below:

<|startoftext|><|im_start|>user
Hello!<|im_end|>
<|im_start|>assistant
Hey there!<|im_end|>

[5]

'<|startoftext|><|im_start|>user\nHello!<|im_end|>\n<|im_start|>assistant\nHey there!<|im_end|>\n'

We get the first 3000 rows of the dataset

[6]

README.md:   0%|          | 0.00/982 [00:00<?, ?B/s]

data/train-00000-of-00001.parquet:   0%|          | 0.00/117M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/100000 [00:00<?, ? examples/s]

We now use standardize_data_formats to try converting datasets to the correct format for finetuning purposes!

[7]

Unsloth: Standardizing formats (num_proc=2):   0%|          | 0/3000 [00:00<?, ? examples/s]

Let's see how row 100 looks like!

[8]

{'conversations': [{'content': 'What is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?',
,   'role': 'user'},
,  {'content': 'In programming, the modulus operator is represented by the \'%\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\n\n```python\n# Calculate the modulus\nModulus = a % b\n\nprint("Modulus of the given numbers is: ", Modulus)\n```\n\nIn this code snippet, the variables \'a\' and \'b\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \'%\', we calculate the remainder when \'a\' is divided by \'b\'. The result is then stored in the variable \'Modulus\'. Finally, the modulus value is printed using the \'print\' statement.\n\nFor example, if \'a\' is 10 and \'b\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\n\n```\nModulus of the given numbers is: 2\n```\n\nThis means that the modulus of 10 and 4 is 2.',
,   'role': 'assistant'}],
, 'source': 'infini-instruct-top-500k',
, 'score': 4.774171352386475}

We now have to apply the chat template for LFM onto the conversations, and save it to text. We also remove the BOS token otherwise we'll get double BOS tokens!

[9]

Map:   0%|          | 0/3000 [00:00<?, ? examples/s]

[10]

'<|im_start|>user\nExplain what boolean operators are, what they do, and provide examples of how they can be used in programming. Additionally, describe the concept of operator precedence and provide examples of how it affects the evaluation of boolean expressions. Discuss the difference between short-circuit evaluation and normal evaluation in boolean expressions and demonstrate their usage in code. \n\nFurthermore, add the requirement that the code must be written in a language that does not support short-circuit evaluation natively, forcing the test taker to implement their own logic for short-circuit evaluation.\n\nFinally, delve into the concept of truthiness and falsiness in programming languages, explaining how it affects the evaluation of boolean expressions. Add the constraint that the test taker must write code that handles cases where truthiness and falsiness are implemented differently across different programming languages.<|im_end|>\n<|im_start|>assistant\nBoolean operators are logical operators used in programming to manipulate boolean values. They operate on one or more boolean operands and return a boolean result. The three main boolean operators are "AND" (&&), "OR" (||), and "NOT" (!).\n\nThe "AND" operator returns true if both of its operands are true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) and (y < 20)  # This expression evaluates to True\n```\n\nThe "OR" operator returns true if at least one of its operands is true, and false otherwise. For example:\n\n```python\nx = 5\ny = 10\nresult = (x > 0) or (y < 20)  # This expression evaluates to True\n```\n\nThe "NOT" operator negates the boolean value of its operand. It returns true if the operand is false, and false if the operand is true. For example:\n\n```python\nx = 5\nresult = not (x > 10)  # This expression evaluates to True\n```\n\nOperator precedence refers to the order in which operators are evaluated in an expression. It ensures that expressions are evaluated correctly. In most programming languages, logical AND has higher precedence than logical OR. For example:\n\n```python\nresult = True or False and False  # This expression is evaluated as (True or (False and False)), which is True\n```\n\nShort-circuit evaluation is a behavior where the second operand of a logical operator is not evaluated if the result can be determined based on the value of the first operand. In short-circuit evaluation, if the first operand of an "AND" operator is false, the second operand is not evaluated because the result will always be false. Similarly, if the first operand of an "OR" operator is true, the second operand is not evaluated because the result will always be true.\n\nIn programming languages that support short-circuit evaluation natively, you can use it to improve performance or avoid errors. For example:\n\n```python\nif x != 0 and (y / x) > 10:\n    # Perform some operation\n```\n\nIn languages without native short-circuit evaluation, you can implement your own logic to achieve the same behavior. Here\'s an example in pseudocode:\n\n```\nif x != 0 {\n    if (y / x) > 10 {\n        // Perform some operation\n    }\n}\n```\n\nTruthiness and falsiness refer to how non-boolean values are evaluated in boolean contexts. In many programming languages, non-zero numbers and non-empty strings are considered truthy, while zero, empty strings, and null/None values are considered falsy.\n\nWhen evaluating boolean expressions, truthiness and falsiness come into play. For example:\n\n```python\nx = 5\nresult = x  # The value of x is truthy, so result is also truthy\n```\n\nTo handle cases where truthiness and falsiness are implemented differently across programming languages, you can explicitly check the desired condition. For example:\n\n```python\nx = 5\nresult = bool(x)  # Explicitly converting x to a boolean value\n```\n\nThis ensures that the result is always a boolean value, regardless of the language\'s truthiness and falsiness rules.<|im_end|>\n'

Train the model

Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None.

[11]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/3000 [00:00<?, ? examples/s]

We also use Unsloth's train_on_completions method to only train on the assistant outputs and ignore the loss on the user's inputs. This helps increase accuracy of finetunes!

[12]

Map (num_proc=2):   0%|          | 0/3000 [00:00<?, ? examples/s]

Let's verify masking the instruction part is done! Let's print the 100th row again.

[13]

'<|startoftext|><|im_start|>user\nWhat is the modulus operator in programming and how can I use it to calculate the modulus of two given numbers?<|im_end|>\n<|im_start|>assistant\nIn programming, the modulus operator is represented by the \'%\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\n\n```python\n# Calculate the modulus\nModulus = a % b\n\nprint("Modulus of the given numbers is: ", Modulus)\n```\n\nIn this code snippet, the variables \'a\' and \'b\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \'%\', we calculate the remainder when \'a\' is divided by \'b\'. The result is then stored in the variable \'Modulus\'. Finally, the modulus value is printed using the \'print\' statement.\n\nFor example, if \'a\' is 10 and \'b\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\n\n```\nModulus of the given numbers is: 2\n```\n\nThis means that the modulus of 10 and 4 is 2.<|im_end|>\n'

Now let's print the masked out example - you should see only the answer is present:

[14]

'                                 In programming, the modulus operator is represented by the \'%\' symbol. It calculates the remainder when one number is divided by another. To calculate the modulus of two given numbers, you can use the modulus operator in the following way:\n\n```python\n# Calculate the modulus\nModulus = a % b\n\nprint("Modulus of the given numbers is: ", Modulus)\n```\n\nIn this code snippet, the variables \'a\' and \'b\' represent the two given numbers for which you want to calculate the modulus. By using the modulus operator \'%\', we calculate the remainder when \'a\' is divided by \'b\'. The result is then stored in the variable \'Modulus\'. Finally, the modulus value is printed using the \'print\' statement.\n\nFor example, if \'a\' is 10 and \'b\' is 4, the modulus calculation would be 10 % 4, which equals 2. Therefore, the output of the above code would be:\n\n```\nModulus of the given numbers is: 2\n```\n\nThis means that the modulus of 10 and 4 is 2.<|im_end|>\n'

[15]

GPU = Tesla T4. Max memory = 14.741 GB.
2.268 GB of memory reserved.

[16]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 3,000 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 9,142,272 of 1,179,482,880 (0.78% trained)
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.

Unsloth: Will smartly offload gradients to save VRAM!

[17]

220.7693 seconds used for training.
3.68 minutes used for training.
Peak reserved memory = 2.85 GB.
Peak reserved memory for training = 0.582 GB.
Peak reserved memory % of max memory = 19.334 %.
Peak reserved memory for training % of max memory = 3.948 %.

Inference

Let's run the model! You can change the instruction and input - leave the output blank!

[18]

The sky<|startoftext|><|im_start|>user
Why is the sky blue?<|im_end|>
<|im_start|>assistant
The sky appears blue because of a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it collides with gas molecules and small particles in the air. Blue light has a shorter wavelength than other colors, so it scatters more easily in all directions. This scattered blue light is what we see when we look up at the sky.
<|im_end|>

[19]

In the jungle, where the trees sway and sway,
Lives a sloth, slow and lazy.
With a heart that beats like a clock,
He moves so slowly, it's hard to stop.

His fur is as soft as a cloud,
And his eyes are like two bright blue moons.
He spends most of his time,
Resting in the sun, or eating leaves.

But don't you worry, dear friend,
For he's not lazy, just slow and keen.
He's a master of his own pace,
And will never rush, no matter the race.

So come and see him, if you dare

Saving, loading finetuned models

To save the final model as LoRA adapters, either use Huggingface's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

[20]

('lora_model/tokenizer_config.json',
, 'lora_model/special_tokens_map.json',
, 'lora_model/chat_template.jinja',
, 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set False to True:

[21]

To code up a transformer, you can follow these steps:

1. Import the necessary libraries:
```python
import torch
from transformers import TFAutoModelForSequenceClassification, TFAutoModelForTokenClassification
```

2. Load the pre-trained model and tokenizer:
```python
model_name = "bert-base-uncased"
model = TFAutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = TFAutoModelForTokenClassification.from_pretrained(model_name)
```

3. Define the input and output classes:
```python
input_class

You can also use Hugging Face's AutoModelForPeftCausalLM. Only use this if you do not have unsloth installed. It can be hopelessly slow, since 4bit model downloading is not supported, and Unsloth's inference is 2x faster.

[22]

Saving to float16 for VLLM

We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

[23]

GGUF / llama.cpp Conversion

To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.

Some supported quant methods (full list on our Wiki page):

q8_0 - Fast conversion. High resource use, but generally acceptable.
q4_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
q5_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[NEW] To finetune and auto export to Ollama, try our Ollama notebook

[24]

Now, use the model-unsloth.gguf file or model-unsloth-Q4_K_M.gguf file in llama.cpp.

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️