Fine Tune Llama
Fine-tune Open-Source LLMs on Nebius Token Factory
Learn how to fine-tune & deploy open models like Llama 3.1 directly from your dataset using Nebius Token Factory, an all-in-one platform for working with large language models (LLMs).
Before you begin, get your API key from the Dashboard.
You can run this notebook on Google Colab (without setting up any python env!) or locally.
Step-1: Fine Tuning Configuration
Pick models to fine tune
Step 2: Installation & Setup
NOT running on Colab
Step 3: Load Configuration
✅ NEBIUS_API_KEY found
Step-4: Initialize Client
Before running, store your key in Colab Variables as NEBIUS_API_KEY or export it as an environment variable.
/Users/sujee/my-stuff/nebius/token-factory-cookbook-worktree/fine-tuning-1/post-training/fine-tuning-1/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html from .autonotebook import tqdm as notebook_tqdm
Dataset({
, features: ['messages'],
, num_rows: 1000
,}) 5.2 - Examine data
[
{
"role": "system",
"content": "Star is a friendly chatbot companion ready to engage in everyday role-playing\u00a0conversations."
},
{
"role": "user",
"content": "Happy birthday, Lillian. Here's a little present I bought for you."
},
{
"role": "assistant",
"content": "Thank you, Paul."
}
]
---
[
{
"role": "system",
"content": "Star is a friendly chatbot companion ready to engage in everyday role-playing\u00a0conversations."
},
{
"role": "user",
"content": "Hi Isabel! You wanted to see me?"
},
{
"role": "assistant",
"content": "Yes Anthony, come on in. Have a seat."
},
{
"role": "user",
"content": "Is everything okay? You seem a bit preoccupied."
},
{
"role": "assistant",
"content": "Well, Anthony, this is not going to be an easy conversation.The company has made some decisions which will have a direct impact on your employment."
},
{
"role": "user",
"content": "Okay..."
},
{
"role": "assistant",
"content": "I don't want to beat around the bush so I'm just gonna come out with it. We are going to have to let you go."
},
{
"role": "user",
"content": "What? Why? I mean... just like that? I'm fired?"
},
{
"role": "assistant",
"content": "I'm sorry but, to be honest, you are a terrible employee!"
},
{
"role": "user",
"content": "What! I resent that!"
},
{
"role": "assistant",
"content": "Anthony, you were caught making international calls from the office phone, you called in sick in eight times this month and you smell like alcohol!"
}
]
---
5.3 - Create train / validate datasets
DatasetDict({
train: Dataset({
features: ['messages'],
num_rows: 800
})
test: Dataset({
features: ['messages'],
num_rows: 200
})
})
Saved 800 samples to data/training_data.jsonl
Saved 200 samples to data/validation_data.jsonl
Step 6: Upload your dataset to Token Factory
Next, we’ll upload the dataset so Nebius can access it for training.
FileObject(id='file-019b0be3-b432-7392-8f07-9d714bc23224', bytes=657543, created_at=1765431030, filename='training_data.jsonl', object='file', purpose='fine-tune', status=None, expires_at=None, status_details=None)
FileObject(id='file-019b0be3-bfc8-7a03-9796-fba7337b9301', bytes=184319, created_at=1765431033, filename='validation_data.jsonl', object='file', purpose='fine-tune', status=None, expires_at=None, status_details=None)
Step 7: Create and start your fine-tuning job
We’ll fine-tune Llama 3.2 1B Instruct using LoRA, which is efficient and much faster than full fine-tuning.
Job created: ftjob-49b4625687fe42d1a981c9f9bbdfd569 | status: running
Step 8: Monitor job progress
When you create a fine-tune job, its initial status will usually be running. The script below polls the status every 15 seconds to check for updates.
If it fails, Nebius will return an error message explaining what went wrong, and how to fix it. If you get a 500 error, just resubmit the job.
The training is complete when you see either Dataset processed successfully or Training completed successfully in the event logs.
Fine tuning job ftjob-49b4625687fe42d1a981c9f9bbdfd569 created, waiting for completion... Elapsed: 30s (0.5 min) : current status: running Elapsed: 61s (1.0 min) : current status: running Elapsed: 92s (1.5 min) : current status: running Elapsed: 122s (2.0 min) : current status: running Elapsed: 153s (2.6 min) : current status: running Elapsed: 183s (3.1 min) : current status: running Elapsed: 213s (3.6 min) : current status: running Elapsed: 244s (4.1 min) : current status: running Elapsed: 274s (4.6 min) : current status: running Elapsed: 305s (5.1 min) : current status: succeeded Final status: succeeded CPU times: user 99.6 ms, sys: 38.1 ms, total: 138 ms Wall time: 5min 5s
Check job events:
[1765431035] info: Job is submitted [1765431063] info: Dataset 'validation' processed successfully [1765431065] info: Dataset 'training' processed successfully [1765431268] info: Training completed successfully
Step-9: Inspect training job
Examine loss over epochs
Step 8: Download your checkpoints
After every epoch, Nebius saves a checkpoint, a snapshot of the model at that stage. You’ll get all of them. For the final model, just grab the last checkpoint.
The code below creates a folder for each checkpoint and saves all the files there.
Step-8: How much did our training cost?
The price for fine-tuning a model under 20B parameters is $0.4/1M tokens. Let's calculate the total fine-tuning price.
Check the pricing guide
Fine-tuning price: $0.5
Setp 9: Deploy Your LoRA Adapter
Now that your fine-tune is complete, you can deploy the LoRA adapter directly on Nebius Token Factory for inference.
This lets you use your fine-tuned model as a hosted endpoint, ready for API calls, experiments, or integration into your own applications.
{'name': 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN',
, 'base_model': 'meta-llama/Meta-Llama-3.1-8B-Instruct',
, 'source': 'ftjob-49b4625687fe42d1a981c9f9bbdfd569:ftckpt_98a07a09-9fb7-4009-8e18-0f72beade81c',
, 'description': 'Fine tuned model',
, 'created_at': 1765431401,
, 'status': 'validating'} deployed model name: meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN
Waiting for validation of LoRA model 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN'...
Current status for 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN': unknown
Current status for 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN': active
{'type': 'text2text', 'name': 'meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN', 'status': 'active', 'status_reason': None, 'checkpoint_id': 'ftckpt_98a07a09-9fb7-4009-8e18-0f72beade81c', 'job_id': 'ftjob-49b4625687fe42d1a981c9f9bbdfd569', 'file_id': None, 'url': None, 'created_at': 1765431401, 'description': 'Fine tuned model', 'vendor': 'meta', 'tags': ['128K context', 'small', 'JSON mode', 'lora'], 'use_cases': ['lora'], 'quality': 73, 'context_window_k': 128, 'size_b': 8.03}
Step-10: Test the newly deployed model
Once the model status becomes active, you can send chat completions just like any OpenAI-compatible model.
{
"id": "chatcmpl-649d3439e2094d8384ab2e941c408aff",
"choices": [
{
"finish_reason": "stop",
"index": 0,
"logprobs": null,
"message": {
"content": "Hello",
"refusal": null,
"role": "assistant",
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [],
"reasoning_content": null
},
"stop_reason": null
}
],
"created": 1765431413,
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-LoRa:my-finetuned-model-JuNN",
"object": "chat.completion",
"service_tier": null,
"system_fingerprint": null,
"usage": {
"completion_tokens": 2,
"prompt_tokens": 41,
"total_tokens": 43,
"completion_tokens_details": null,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
DONE!
And that’s it!
You’ve just fine-tuned, deployed, and run inference with your own LoRA model, all using Nebius Token Factory.
If you want to go further, here are a few next steps worth exploring:
- Track Fine-Tuning Jobs: Monitor progress, view logs, and check model checkpoints
- Deploy Your Custom Model: Set up inference endpoints and integrate your fine-tuned model into applications
- Fine-Tuning Docs: Learn about hyperparameters, LoRA configurations, and advanced options
- Nebius Token Factory Dashboard: Manage models, datasets, and deployments visually
Start tracking and deploying your fine-tuned models today at Nebius Token Factory.