Unsloth Kaggle CodeForces Cot Finetune For Reasoning On CodeForces

Kaggle CodeForces Cot Finetune For Reasoning On CodeForces

unsloth-notebooksunslothnb

alph-notebooks/unsloth-notebooks / Kaggle-CodeForces-cot-Finetune_for_Reasoning_on_CodeForces.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!

Join Discord if you need help + ⭐ Star us on Github ⭐

To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.

You will learn how to do data prep, how to train, how to run the model, & how to save it

News

Introducing Unsloth Studio — a new open source, no-code web UI to train and run LLMs. Blog

_{Train models — no code needed}

_{Run GGUF models on Mac, Windows & Linux}

Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog

Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog

New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL

Visit our docs for all our model uploads and notebooks.

Installation

[ ]

Unsloth

[ ]

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.15: Fast Qwen2 patching. Transformers: 4.48.3.
   \\   /|    NVIDIA A100-SXM4-40GB. Num GPUs = 1. Max memory: 39.557 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 8.0. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!

model.safetensors:   0%|          | 0.00/5.55G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/265 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

[ ]

Unsloth 2025.3.15 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.

Dataset

We are using the open-r1/codeforces-cots by the Hugging Face Open R1 project. This is a dataset of problems for a competitive coding challenge.

The Open R1 team have found that models trained to reason on competitive coding tasks like this outperform other models on coding benchmarks.

[6]

README.md: 0.00B [00:00, ?B/s]

train-00000-of-00014.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

train-00001-of-00014.parquet:   0%|          | 0.00/140M [00:00<?, ?B/s]

train-00002-of-00014.parquet:   0%|          | 0.00/135M [00:00<?, ?B/s]

train-00003-of-00014.parquet:   0%|          | 0.00/128M [00:00<?, ?B/s]

train-00004-of-00014.parquet:   0%|          | 0.00/129M [00:00<?, ?B/s]

train-00005-of-00014.parquet:   0%|          | 0.00/148M [00:00<?, ?B/s]

train-00006-of-00014.parquet:   0%|          | 0.00/135M [00:00<?, ?B/s]

train-00007-of-00014.parquet:   0%|          | 0.00/164M [00:00<?, ?B/s]

train-00008-of-00014.parquet:   0%|          | 0.00/150M [00:00<?, ?B/s]

train-00009-of-00014.parquet:   0%|          | 0.00/151M [00:00<?, ?B/s]

train-00010-of-00014.parquet:   0%|          | 0.00/120M [00:00<?, ?B/s]

train-00011-of-00014.parquet:   0%|          | 0.00/153M [00:00<?, ?B/s]

train-00012-of-00014.parquet:   0%|          | 0.00/127M [00:00<?, ?B/s]

train-00013-of-00014.parquet:   0%|          | 0.00/171M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/40665 [00:00<?, ? examples/s]

[7]

"<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n<|im_start|>user\nYou will be given a competitive programming problem. Please reason step by step about the solution, then provide a complete implementation in C++17.\n\nYour solution must read input from standard input (cin), write output to standard output (cout).\nDo not include any debug prints or additional output.\n\nPut your final solution within a single code block:\n```cpp\n<your code here>\n```\n\n# Problem\n\nYou are given an array $$$a$$$ of $$$n$$$ integers, where $$$n$$$ is odd.\n\nIn one operation, you will remove two adjacent elements from the array $$$a$$$, and then concatenate the remaining parts of the array. For example, given the array $$$[4,7,4,2,9]$$$, we can obtain the arrays $$$[4,2,9]$$$ and $$$[4,7,9]$$$ by the operations $$$[\\underline{4,7}, 4,2,9] \\to [4,2,9]$$$ and $$$[4,7,\\underline{4,2},9] \\to [4,7,9]$$$ respectively. However, we cannot obtain the array $$$[7,2,9]$$$ as it requires deleting non-adjacent elements $$$[\\underline{4},7,\\underline{4},2,9]$$$.\n\nYou will repeatedly perform this operation until exactly one element remains in $$$a$$$.\n\nFind the maximum possible value of the remaining element in $$$a$$$.\n\nExecution time limit: 1.0 seconds\nMemory limit: 256.0 MB\n\n## Input Format\nEach test contains multiple test cases. The first line contains a single integer $$$t$$$ ($$$1 \\le t \\le 1000$$$)\xa0— the number of test cases. The description of test cases follows.\n\nThe first line of each test case contains a single integer $$$n$$$ ($$$1 \\le n \\le 99$$$; $$$n$$$ is odd)\xa0— the length of the array $$$a$$$.\n\nThe second line of each test case contains $$$n$$$ integers $$$a_1, a_2, \\ldots, a_n$$$ ($$$1 \\le a_i \\le 100$$$)\xa0— the elements of the array $$$a$$$.\n\nNote that there is no bound on the sum of $$$n$$$ over all test cases.\n\n## Output Format\nFor each test case, output a single integer\xa0— the maximum possible value of the remaining element in $$$a$$$.\n\n## Examples\n```input\n4\n1\n6\n3\n1 3 2\n5\n4 7 4 2 9\n7\n3 1 4 1 5 9 2\n```\n```output\n6\n2\n9\n5\n```\n\n## Note\nIn the first test case, the array $$$a$$$ is $$$[6]$$$. Since there is only one element, no operations are needed. The maximum possible value of the remaining element is $$$6$$$.\n\nIn the second test case, the array $$$a$$$ is $$$[1, 3, 2]$$$. We can remove the first two elements $$$[\\underline{1, 3}, 2] \\to [2]$$$, or remove the last two elements $$$[1, \\underline{3, 2}] \\to [1]$$$. Therefore, the maximum possible value of the remaining element is $$$2$$$.\n\nIn the third test case, the array $$$a$$$ is $$$[4, 7, 4, 2, 9]$$$. One way to maximize the remaining element is $$$[4, \\underline{7, 4}, 2, 9] \\to [\\underline{4, 2}, 9] \\to [9]$$$. Therefore, the maximum possible value of the remaining element is $$$9$$$.\n\nIn the fourth test case, the array $$$a$$$ is $$$[3, 1, 4, 1, 5, 9, 2]$$$. It can be shown that the maximum possible value of the remaining element is $$$5$$$.\n<|im_end|>\n<|im_start|>assistant\n<think>\nOkay, let's try to figure out how to solve this problem. The problem is that we have an array of odd length, and we can repeatedly remove pairs of adjacent elements until there's one left. We need to find the maximum possible remaining element after all possible operations.\n\nHmm. Let's think about the process. Each operation removes two adjacent elements. Since the array length is odd, after each operation the length decreases by 1 (because 2 are removed, so the new length is n-2 + ... wait, no. Wait, when you remove two adjacent elements, the array's length reduces by 2 each time. Wait, the original array length is n, which is odd. Then after each operation, the length becomes n-2, which is odd minus 2, which is even? Wait no, wait. Wait, n is odd. For example, if n is 3, after removing two elements, the length is 1. Which is odd. Wait, so each operation reduces the length by 2. So the parity of the length remains odd. Because subtracting 2 from an odd number gives another odd number. Wait, no. Wait 3-2=1 (odd), 1-2=-1, which is not possible. So the process stops when the length is 1. So for the starting n which is odd, after each operation the length is n-2, which is again odd, until it reaches 1.\n\nSo the process will take (n-1)/2 steps. Because each step removes two elements, so (n-1)/2 steps to get from n to 1.\n\nBut how to model which elements can be kept. The key is to find a selection of elements where each step removes two adjacent elements, and the remaining elements form a possible path of removal steps leading to that element being the last one.\n\nWait, perhaps there's a pattern here. Let's look at the examples.\n\nIn the second test case: 3 elements. The options are to remove the first two, leaving the third, or remove the last two, leaving the first. So the maximum is the maximum of the first or third element. But in the example given, the array is [1,3,2]. The options are 2 or 1, so maximum is 2. But the third element is 2. Wait, so that's the case. So for n=3, the maximum is the maximum of the elements at positions 0, 2 (assuming 0-based indexing). Because after removing the first two, you get the third. Or removing the last two, you get the first.\n\nWait, that's interesting. So for n=3, the last remaining element can be either the first or the third. So the maximum is the maximum of those two.\n\nWhat about n=5. Let's take the third example: [4,7,4,2,9]. The solution was 9, which is the fifth element. How did that happen? Let's see the steps. They first removed 7 and 4 (the second and third elements), leading to [4,2,9]. Then in that array, they removed 4 and 2, leading to 9. So the final element is the fifth element of the original array.\n\nAlternatively, for a 5-element array, maybe the possible elements that can remain are those at positions 0, 2, 4. Because when you start with 5 elements, after each removal step, you have 3, then 1. So maybe the positions that are even indices in the original array can be the ones that are kept.\n\nWait, perhaps the answer is that the maximum element among the elements at the even indices (0, 2, 4, etc.) is the answer. Because when you have an array of odd length, each time you remove two adjacent elements, the remaining elements' positions depend on which pair was removed.\n\nWait, for example, in the 5-element array. Let's say the array is a0, a1, a2, a3, a4. If you remove a1 and a2, then the remaining array is a0, a3, a4. Then, in the next step, you can remove a0 and a3 (leaving a4), or a3 and a4 (leaving a0). So in this path, the final element is a4 or a0.\n\nAlternatively, if you first remove a3 and a4, then the remaining array is a0, a1, a2. Then next step removes a0 and a1, leaving a2. So the final element is a2.\n\nAnother possibility: first remove a0 and a1, leaving a2, a3, a4. Then next step removes a3 and a4, leaving a2. So the final element is a2.\n\nSo for a 5-element array, the possible final elements are a0, a2, a4. So the maximum is the maximum of these three.\n\nIf that's the case, then for the third example in the input, the elements at even indices (0,2,4) are 4,4,9. The maximum is 9, which matches the sample output.\n\nSimilarly, in the fourth example, the input array is [3,1,4,1,5,9,2] (n=7). The even indices are 0,2,4,6. The elements are 3,4,5,2. The maximum is 5, which matches the sample output.\n\nAnother sample is the second test case (n=3): even indices 0 and 2. Elements are 1 and 2. Max is 2.\n\nSo this seems to fit all the examples.\n\nSo the solution is: for the array of odd length n, the maximum possible remaining element is the maximum among all elements at even indices (0-based). Because those are the positions that could end up as the last element after removing pairs in a certain way.\n\nWait, why is that the case? Let's think recursively. For an array of length 1, the only element is the answer. For an array of length 3, you can remove the first two elements (leaving the third) or the last two (leaving the first). So the answer is max of first and third. For length 5, the possible elements that can be left are the first, third, fifth elements, because each step of removing pairs can leave the even positions.\n\nBut how to see this pattern holds for any odd n?\n\nHmm. Let's think for n=5. The possible elements that can be left after all operations are the elements at positions 0, 2, 4.\n\nSimilarly, for n=7, the possible elements are 0, 2,4,6.\n\nSo the pattern is that the positions that can be the last element are those with even indices (0, 2, 4, ..., n-1). Because when you remove pairs, the even indices can be preserved.\n\nWait, but how exactly? For example, in the 5-element array, how can a2 be preserved?\n\nIn the example where you first remove a3 and a4, then the array is a0, a1, a2. Then you can remove a0 and a1, leaving a2. So in that case, a2 is the last element.\n\nSo the key insight is that when you remove pairs in such a way that certain elements are skipped, you can end up with any even index element as the last remaining.\n\nTherefore, the answer is simply the maximum value among the elements at even positions (0, 2, 4, ...) in the original array.\n\nSo the solution is to iterate over all elements in even indices and find the maximum.\n\nThat's a very simple solution. So the code would be, for each test case, read the array, iterate through the elements at even indices (0, 2, 4, etc.), and take the maximum.\n\nLet me check the samples again.\n\nFirst sample: n=1. The only element is at index 0, which is even. So answer is 6. Correct.\n\nSecond sample: n=3. Indices 0 and 2. Max of 1 and 2 is 2. Correct.\n\nThird sample: n=5. Indices 0, 2,4. Values 4,4,9. Max 9. Correct.\n\nFourth sample: n=7. Indices 0,2,4,6. Values 3,4,5,2. Max 5. Correct.\n\nYes, all the samples are covered by this approach.\n\nSo the problem reduces to selecting the maximum element from the even indices of the array.\n\nTherefore, the code is straightforward. Read t test cases. For each, read n, then the array. Then loop through the array, check the elements at even indices (i=0,2,4,...,n-1), and take the maximum.\n\nThat's it. So this solution should work.\n\nSo the code would be:\n\nFor each test case:\n\nRead n.\n\nRead the array a.\n\nInitialize max_val to a[0].\n\nThen for i from 2 to n-1, stepping by 2:\n\nif a[i] > max_val, update max_val.\n\nThen output max_val.\n\nYes.\n\nNow let's write the code.\n\nIn C++, the steps would be:\n\nRead t.\n\nLoop t times:\n\nRead n.\n\nRead n integers into a vector.\n\nInitialize the maximum as a[0].\n\nLoop i from 2 to n-1, step 2 (i += 2):\n\nCompare and update the maximum.\n\nPrint the maximum.\n\nWait, for n=1, the loop runs from 2 to 0, which is not executed. So the maximum remains a[0], which is correct.\n\nYes.\n\nSo the code would look like:\n\n#include <bits/stdc++.h>\nusing namespace std;\n\nint main() {\n    int t;\n    cin >> t;\n    while (t--) {\n        int n;\n        cin >> n;\n        vector<int> a(n);\n        for (int i=0; i<n; i++) {\n            cin >> a[i];\n        }\n        int max_val = a[0];\n        for (int i=2; i <n; i +=2) {\n            if (a[i] > max_val) {\n                max_val = a[i];\n            }\n        }\n        cout << max_val << endl;\n    }\n    return 0;\n}\n\nTesting this code against the samples.\n\nSample 1:\n\nInput:\n\n4\n\n1\n\n6\n\nOutput 6.\n\nSample 2:\n\n3\n\n1 3 2. The even indices are 0 and 2. Values 1 and 2. Max is 2.\n\nSample3:\n\n5\n\n4 7 4 2 9. Even indices 0,2,4. Values 4,4,9. Max is9.\n\nSample4:\n\n7\n\n3 1 4 1 5 9 2. Even indices 0 (3), 2 (4), 4 (5), 6 (2). Max is 5.\n\nAll correct.\n\nSo this code should pass all test cases.\n\nThus, the solution is to take the maximum among elements at even indices.\n</think>\n\n```cpp\n#include <bits/stdc++.h>\nusing namespace std;\n\nint main() {\n    int t;\n    cin >> t;\n    while (t--) {\n        int n;\n        cin >> n;\n        vector<int> a(n);\n        for (int i = 0; i < n; ++i) {\n            cin >> a[i];\n        }\n        int max_val = a[0];\n        for (int i = 2; i < n; i += 2) {\n            if (a[i] > max_val) {\n                max_val = a[i];\n            }\n        }\n        cout << max_val << '\\n';\n    }\n    return 0;\n}\n```<|im_end|>\n"

[8]

Train the model

Now let's train our model. We do 60 steps to speed things up, but you can set num_train_epochs=1 for a full run, and turn off max_steps=None. We also support DPOTrainer and GRPOTrainer for reinforcement learning!!

[9]

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/40665 [00:00<?, ? examples/s]

[10]

GPU = NVIDIA A100-SXM4-40GB. Max memory = 39.557 GB.
5.393 GB of memory reserved.

[11]

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 40,665 | Num Epochs = 1 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 40,370,176/7,000,000,000 (0.58% trained)

Unsloth: Will smartly offload gradients to save VRAM!

[ ]

462.3942 seconds used for training.
7.71 minutes used for training.
Peak reserved memory = 7.893 GB.
Peak reserved memory for training = 2.129 GB.
Peak reserved memory % of max memory = 53.519 %.
Peak reserved memory for training % of max memory = 14.436 %.

Inference

Let's run the model! You can change the instruction and input - leave the output blank!

[12]

['A tech company, "DataGoSlow," produces high-performance memory modules. They sell these modules in \'n\' different retail outlets. Each outlet sells the modules at a unique price, represented by \'xi\' dollars.\n\nA large data center plans to purchase these memory modules over \'q\' consecutive procurement cycles. For each cycle, the data center has a budget constraint \'mi\' dollars. They want to know how many outlets sell the memory modules at a price *strictly less than* their budget.\n\n**Constraints:**\n\n* Time limit per test: 2.0 seconds\n* Memory limit per test: 256.0 megabytes\n\n**Input Format:**\n\n* The first line of the input contains a single integer \'n\' (1 ≤ n ≤ 100 000) — the number of retail outlets.\n* The second line contains \'n\' integers \'xi\' (1 ≤ xi ≤ 100 000) — prices of the memory modules in each outlet.\n* The third line contains a single integer \'q\' (1 ≤ q ≤ 100 000) — the number of procurement cycles.\n* Then follow \'q\' lines, each containing one integer \'mi\' (1 ≤ mi ≤ 109) — the budget for the i-th procurement cycle.\n\n**Output Format:**\n\n* Print \'q\' integers. The i-th of them should be equal to the number of outlets where the memory modules can be purchased at a price *strictly less than* the budget for the i-th cycle.\n\n**Example:**\n\n```input\n5\n3 10 8 6 11\n4\n1\n10\n3\n11\n```\n\n```output\n0\n2\n3\n0\n```\n\n**Note:**\n\nIn the first and fourth cases, there are no such outlets.\n\nIn the second case, the first and third outlets have prices less than 10.\n\nIn the third case, all three outlets have prices less than ']

You can also use a TextStreamer for continuous inference - so you can see the generation token by token, instead of waiting the whole time!

[ ]

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Continue the fibonnaci sequence.

### Input:
1, 1, 2, 3, 5, 8

### Response:
13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025, 121393, 196418, 317811,

Saving, loading finetuned models

To save the final model as LoRA adapters, either use Hugging Face's push_to_hub for an online save or save_pretrained for a local save.

[NOTE] This ONLY saves the LoRA adapters, and not the full model. To save to 16bit or GGUF, scroll down!

[ ]

('lora_model/tokenizer_config.json',
, 'lora_model/special_tokens_map.json',
, 'lora_model/vocab.json',
, 'lora_model/merges.txt',
, 'lora_model/added_tokens.json',
, 'lora_model/tokenizer.json')

Now if you want to load the LoRA adapters we just saved for inference, set False to True:

[ ]

Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
What is a famous tall tower in Paris?

### Input:


### Response:
One of the most famous tall towers in Paris is the Eiffel Tower. It was built by Gustave Eiffel in 1889 and stands at a height of 324 meters (1,063 feet). It is a symbol of Paris and one of the most recognizable structures in the world.<|endoftext|>

You can also use Hugging Face's AutoPeftModelForCausalLM. Only use this if you do not have unsloth installed. It can be hopelessly slow, since 4bit model downloading is not supported, and Unsloth's inference is 2x faster.

[ ]

Saving to float16 for VLLM

We also support saving to float16 directly. Select merged_16bit for float16 or merged_4bit for int4. We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See our docs for more deployment options.

[ ]

GGUF / llama.cpp Conversion

To save to GGUF / llama.cpp, we support it natively now! We clone llama.cpp and we default save it to q8_0. We allow all methods like q4_k_m. Use save_pretrained_gguf for local saving and push_to_hub_gguf for uploading to HF.

Some supported quant methods (full list on our docs page):

q8_0 - Fast conversion. High resource use, but generally acceptable.
q4_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q4_K.
q5_k_m - Recommended. Uses Q6_K for half of the attention.wv and feed_forward.w2 tensors, else Q5_K.

[NEW] To finetune and auto export to Ollama, try our Ollama notebook

[ ]

Now, use the codeforces_cot_finetune.Q8_0.gguf file or codeforces_cot_finetune.Q4_K_M.gguf file in llama.cpp.

And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!

Some other resources:

Train your own reasoning model - Llama GRPO notebook Free Colab
Saving finetunes to Ollama. Free notebook
Llama 3.2 Vision finetuning - Radiography use case. Free Colab
See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!

Join Discord if you need help + ⭐️ Star us on Github ⭐️

This notebook and all Unsloth notebooks are licensed LGPL-3.0.