HuggingFace Course Gpt Oss (20B) GRPO
To run this, press "Runtime" and press "Run all" on a free Tesla T4 Google Colab instance!
To install Unsloth on your local device, follow our guide. This notebook is licensed LGPL-3.0.
You will learn how to do data prep, how to train, how to run the model, & how to save it
News
Train MoEs - DeepSeek, GLM, Qwen and gpt-oss 12x faster with 35% less VRAM. Blog
You can now train embedding models 1.8-3.3x faster with 20% less VRAM. Blog
Ultra Long-Context Reinforcement Learning is here with 7x more context windows! Blog
3x faster LLM training with 30% less VRAM and 500K context. 3x faster • 500K Context
New in Reinforcement Learning: FP8 RL • Vision RL • Standby • gpt-oss RL
Visit our docs for all our model uploads and notebooks.
Installation
Unsloth
Goal: Make faster kernels with Reinforcement Learning
Our goal is to make a faster matrix multiplication kernel by doing RL on GPT-OSS 20B with Unsloth.
You will learn how to:
- Counteract reward hacking like cheating, caching, laziness.
- Timing and correctness of kernels and time limits.
- Making good reward functions
- How to seriously do RL to make optimized CUDA kernels
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning. 🦥 Unsloth Zoo will now patch everything to make training faster! ==((====))== Unsloth 2025.11.3: Fast Gpt_Oss patching. Transformers: 4.56.2. \\ /| Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux. O^O/ \_/ \ Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0 \ / Bfloat16 = FALSE. FA [Xformers = None. FA2 = False] "-____-" Free license: http://github.com/unslothai/unsloth Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored! Unsloth: Using float16 precision for gpt_oss won't work! Using float32.
model.safetensors.index.json: 0.00B [00:00, ?B/s]
model-00001-of-00004.safetensors: 0%| | 0.00/4.00G [00:00<?, ?B/s]
model-00002-of-00004.safetensors: 0%| | 0.00/4.00G [00:00<?, ?B/s]
model-00003-of-00004.safetensors: 0%| | 0.00/3.37G [00:00<?, ?B/s]
model-00004-of-00004.safetensors: 0%| | 0.00/1.16G [00:00<?, ?B/s]
Loading checkpoint shards: 0%| | 0/4 [00:00<?, ?it/s]
generation_config.json: 0%| | 0.00/165 [00:00<?, ?B/s]
Unsloth: Offloading embeddings to RAM to save 1.08 GB.
tokenizer_config.json: 0.00B [00:00, ?B/s]
tokenizer.json: 0%| | 0.00/27.9M [00:00<?, ?B/s]
special_tokens_map.json: 0%| | 0.00/446 [00:00<?, ?B/s]
chat_template.jinja: 0.00B [00:00, ?B/s]
We now add some small amount of LoRA weights to GPT-OSS so we only need to train those, instead of training on the full model.
Unsloth: Making `model.base_model.model.model` require gradients
Optimized matrix multiplication
Numpy has optimized matrix multiplication kernels for CPUs via BLAS optimized operations. For GPUs, one can use CUDA accelerated cuBLAS kernels which PyTorch calls under the hood.
To generate some random matrices to do matrix multiplication, we can do the below:
We shall generate a small matrix, and see the matrix multiplied output
[[-2.8313286 4.54613909 -7.95265309 6.53459836 2.87235103] [ 7.0739631 3.76278879 9.31565599 -8.52884711 9.96832952] [ 8.41214082 6.51136046 -3.79347975 -2.46773693 -2.32292989] [ 3.91302932 4.98335304 -5.33855089 5.71057634 -2.79871647]] [[ 0.39218774 -9.6181377 -3.49736707] [-0.33354865 -1.05626139 3.87231208] [ 0.49494174 5.91863954 -6.83183693] [ 5.1465162 -7.51648113 1.00445384] [ 9.63213377 -4.92327556 3.323014 ]] [[ 54.73441488 -87.89725072 97.94605887] [ 58.25238906 -1.8467447 -49.25453031] [ -35.82528794 -80.25394462 11.51225408] [ -0.33785799 -103.64132345 38.51974367]]
We can call a LLM to generate a simple matrix multiply kernel in Python only, and we can calculate the differences between the actual result and the kernel's result
We see the error below is very small, so that's good!
(7.105427357601002e-15, 4.6783406255758477e-29)
Countering Reward Hacking
The ultimate goal of RL is to maximize some reward (say speed, revenue, some metric).
But RL can cheat When the RL algorithm learns a trick or exploits something to increase the reward, without actually doing the task at end, this is called "Reward Hacking".
Some good examples are in https://en.wikipedia.org/wiki/Reward_hacking
For matrix multiplication kernels, we might see the following issues:
- Laziness: RL learns to use Numpy, Torch, other libraries, which calls optimized CUDA kernels.
- Caching: RL learns to cache the result of the output
- Cheating: RL learns to find the actual output by inspecting Python global variables
- RL learns to edit the timing function to make it output 0 time as passed.
And possibly more. We shall try to address each!
Countering Reward Hacking 1: Stop laziness
We can stop the RL algorithm from calling optimized code by inspecting if the generated code imports other non standard Python libraries. We used GPT-5 to help generate this check check_only_stdlib_imports:
For example, let's call check_only_stdlib_imports on a random piece of matrix multiplication code generated by GPT-5:
Only stdlib imports? False
{'stdlib': [], 'non_stdlib': ['numpy', 'torch'], 'relative_imports': 0}
Countering Reward Hacking 2: Stop cheating
We can stop the RL algorithm from using global or cached variables by restricting it's locals and globals.
We are also going to use exec to create the function, so we have to save the output to an empty dict.
We also disallow global variable access.
<function matmul(A, B)>
We also disallow global variable access via types.FunctionType(f.__code__, {})
Success name 'np' is not defined
Countering Reward Hacking 3: Stop caching
We can stop the RL algorithm from using cached data by wiping the cache with a large fake matrix. We also have to benchmark carefully with multiple loops and turns.
We also add a timer to not make the algorithm go in an endless loop.
For example we use our matmul kernel we had, and benchmark it with a 10 second delay:
{'median_ns': 64112906,
, 'mean_ns': 64112906,
, 'stdev_ns': 0,
, 'exceptions': [],
, 'timeouts': 0} Data & RL task setup
We now have to create a prompt to the model for which it will do some task. For our matrix multiply example, we use the below:
Create a new fast matrix multiplication function using only native Python code.
You are given a list of list of numbers.
Output your new function in backticks using the format below:
```python
def matmul(A, B):
return ...
```
First, let's prompt GPT-OSS without RL and see how it goes:
<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-11-21
Reasoning: low
# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Create a new fast matrix multiplication function using only native Python code.
You are given a list of list of numbers.
Output your new function in backticks using the format below:
```python
def matmul(A, B):
return ...
```<|end|><|start|>assistant<|channel|>analysis<|message|>We need to provide a quick naive matrix multiplication? "New fast" but only native Python. Provide code. Use list comprehension or loops. Optimize maybe by transposing B. Provide code.<|end|><|start|>assistant<|channel|>final<|message|>Here is a simple, fast implementation of matrix multiplication that uses only native Python constructs and avoids any external dependencies. It works with any two 2‑D lists of numbers (ints, floats, etc.), assuming their dimensions are compatible:
```python
def matmul(A, B):
"""
Multiplies two matrices A and B.
Parameters
----------
A : list of lists
First matrix. Must have shape (m, n).
B : list of lists
Second matrix. Must have shape (n, p).
Returns
-------
list of lists
The resulting matrix of shape (m, p).
"""
# Check basic compatibility
if not A or not B or len(A[0]) != len(B):
raise ValueError("Incompatible dimensions for matrix multiplication.")
# Transpose B once to speed up access
B_t = [list(col) for col in zip(*B)] # B transpose, O(n*p)
m, n = len(A), len(A[0])
p = len(B_t)
result = [[0.0] * p for _ in range(m)]
# Standard triple-loop but with B transposed for cache friendliness
for i in range(m):
a_row = A[i]
res_row = result[i]
for j in range(p):
res_row[j] = sum(a_row[k] * B_t[j][k] for k in range(n))
return result
```
**How it works**
1. **Input validation**: It checks that the number of columns in `A` matches the number of rows in `B`.
2. **Transposition of `B`**: By transposing `B` (`B_t`), we turn repeated index lookups into simple list accesses, which is much faster in pure Python than accessing nested lists repeatedly.
3. **Main loop**: For each row `i` of `A` and each row `j` of `B_t` (i.e., each column of `B`), the inner generator expression computes the dot product.
4. **Result**: The function returns a list of lists representing the product matrix.
This implementation is concise,
Reward functions
We now design the extract_function function which simply extracts the function wrapped in 3 backticks.
And 4 reward functions:
function_workswhich rewards the model if the strategy is a valid Python function.no_cheatingwhich checks if the function imported other modules, and if it did, we penalize it.correctness_checkwhich checks if the kernel was correct or wrong - it shouldn't generate gibberish!speed_checkchecks the performance relative to Numpy matmul directly.
def matmul(A, B):
return ...
Below is our function_works reward function which uses Python's exec but guarded by not allowing leakage of local and global variables. We can also use check_only_stdlib_imports first to check if there are errors before even executing the function:
(False,
, {'error': "SyntaxError: expected '(' (<unknown>, line 1)",
, 'stdlib': [],
, 'non_stdlib': [],
, 'relative_imports': 0}) no_cheating checks if the function cheated since it might have imported Numpy or Torch optimized code.
Next correctness_check checks if the kernel was correct. We want to penalize if the absolute error is larger than 1, and if the mean squared error is somewhat bigger then machine epsilon.
We have to execute the code now!
np.float64(2.220446049250313e-16)
Finally our benchmarking function for speed_check! We shall limit the timer to 10 seconds and do 3 trials.
{'median_ns': 195725,
, 'mean_ns': 211578,
, 'stdev_ns': 30687,
, 'exceptions': [],
, 'timeouts': 0} {'median_ns': 70811,
, 'mean_ns': 69910,
, 'stdev_ns': 2926,
, 'exceptions': [],
, 'timeouts': 0} We can take the difference and do a negative sign for slower ones. If the ratio is less than 1 (ie faster, we shall invert it!)
0.02764047958650492
3.333333333333333
We create the dataset which includes a replica of our prompt. Remember to add reasoning effort of low!
49
{'prompt': [{'content': 'Create a new fast matrix multiplication function using only native Python code.\nYou are given a list of list of numbers.\nOutput your new function in backticks using the format below:\n```python\ndef matmul(A, B):\n return ...\n```',
, 'role': 'user'}],
, 'answer': 0,
, 'reasoning_effort': 'low'} Train the model
Now set up GRPO Trainer and all configurations! We also support GSDP, GAPO, Dr GRPO and more! Go to our docs https://unsloth.ai/docs/ for more info!
Unsloth: We now expect `per_device_train_batch_size` * `gradient_accumulation_steps` * `world_size` to be a multiple of `num_generations`. We will change the batch size of 1 to the `num_generations` of 2
And let's run the trainer! If you scroll up, you'll see a table of rewards. The goal is to see the reward column increase!
You might have to wait 150 to 200 steps for any action. You'll probably get 0 reward for the first 100 steps. Please be patient!
| Step | Training Loss | reward | reward_std | completion_length | kl |
|---|---|---|---|---|---|
| 1 | 0.000000 | 0.125000 | 0.000000 | 200.000000 | 0.000000 |
| 2 | 0.000000 | 0.072375 | 0.248112 | 200.000000 | 0.000000 |
| 3 | 0.000000 | -0.079000 | 0.163776 | 182.500000 | 0.000005 |
Unsloth: Switching to float32 training since model cannot work with float16
And let's train the model!
NOTE A T4 free GPU might take 5 minutes for one generation sadly since it's an old GPU - A100 or H100 will be much faster!
==((====))== Unsloth - 2x faster free finetuning | Num GPUs used = 2
\\ /| Num examples = 1,000 | Num Epochs = 1 | Total steps = 100
O^O/ \_/ \ Batch size per device = 2 | Gradient accumulation steps = 1
\ / Data Parallel GPUs = 1 | Total batch size (2 x 1 x 1) = 2
"-____-" Trainable parameters = 1,990,656 of 20,916,747,840 (0.01% trained)
`generation_config` default values have been modified to match model-specific defaults: {'max_length': 131072}. If this is not desired, please set these values explicitly.
def matmul(A, B):
n=len(A); m=len(B[0]); p=len(B)
res=[[0]*m for _ in range(n)]
for i in range(n):
Ai=A[i]
for k in range(p):
aik=Ai[k]
if aik:
Bk=B[k]
for j in range(m):
res[i][j] += aik*Bk[j]
return res
def matmul(A, B):
...
def matmul(A, B):
m = len(A)
k = len(A[0])
n = len(B[0])
# adjust check
if len(B) != k: raise ValueError
# initialize result
result = [[0]*n for _ in range(m)]
for i in range(m):
for j in range(n):
sum_val = 0
for p in range(k):
sum_val += A[i][p]*B[p][j]
result[i][j] = sum_val
def matmul(A, B):
...
Unsloth: Will smartly offload gradients to save VRAM!
def matmul(A, B):
# A: m x n, B: n x p
m, n = len(A), len(A[0]) # error if A empty
# verify shape of B
if len(B) != n: raise ValueError("Incompatible dimensions")
p = len(B[0])
# compute result matrix
result = [[0]*p for _ in range(m)]
for i in range(m):
for k in range(n):
aik = A[i][k]
if aik:
for j in range(p):
result[i][j] += aik*B[k][j]
return result
def matmul(A, B):
"""
Multiply two matrices A and B, where A and B are lists of lists.
The function performs a standard matrix multiplication using plain
Python loops and integer/float arithmetic without any external libraries.
Parameters:
A (list[list[Union[int, float]]]): Left‑hand operand.
B (list[list[Union[int, float]]]): Right‑hand operand.
Returns:
list[list[Union[int, float]]]: Product matrix C = A * B.
Raises:
ValueError: If matrix dimensions are incompatible.
"""
# Validate inputs
if not A or not B:
raise ValueError("Input matrices must not be empty.")
if any(len(row) == 0 for row in A) or any(len(row) == 0 for row in B):
raise ValueError("Matrix rows must be non‑empty.")
if len(A[0]) != len(B):
raise ValueError(
f"Incompatible dimensions: A is {len(A)}x{len(A[0])} "
f"but B is {len(B)}x{len(B[0])}."
)
m = len(A) # Rows in A
n = len(B[0]) # Columns in B
p = len(B) # Columns in A = rows in B
# Allocate result matrix
C = [[0.0 for _ in range(n)] for _ in range(m)]
# Compute matrix product
for i in range(m):
for j in range(n):
sum_val = 0.0
for k in range(p):
sum_val += A[i][k] * B[k][j]
C[i][j] = sum_val
return C
def matmul(A, B):
return ...
def matmul(A, B):
if not A or not B:
return []
n = len(A)
m = len(B[0])
p = len(B)
result = [[0]*m for _ in range(n)]
for i in range(n):
for k in range(p):
aik = A[i][k]
if aik:
for j in range(m):
result[i][j] += aik*B[k][j]
return result
def matmul(A, B):
# A: m x n, B: n x p
m = len(A)
n = len(A[0]) # also len(B)
p = len(B[0])
# initialize result matrix with zeros
result = [[0]*p for _ in range(m)]
for i in range(m):
rowA = A[i]
res_row = result[i]
for k in range(n):
aik = rowA[k]
if aik:
# then we add aik * B[k][j] to each column j
rowBk = B[k]
for j in range(p):
res_row[j] += aik * rowBk[j]
return result
def matmul(A, B):
m = len(A)
k = len(A[0])
n = len(B[0])
C = [[0]*n for _ in range(m)]
for i in range(m):
a_row = A[i]
Ci = C[i]
for j in range(n):
s = 0
for t in range(k):
s += a_row[t] * B[t][j]
Ci[j] = s
return C
def matmul(A, B):
n = len(B[0]) # columns in B
result = []
for i in range(len(A)):
row = []
for j in range(n):
val = 0
for k in range(len(A[0])):
val += A[i][k] * B[k][j]
row.append(val)
result.append(row)
return result
def matmul(A, B):
# Implementation (Strassen) ...
def matmul(A, B):
assert len(A[0]) == len(B)
m = len(A)
n = len(A[0])
p = len(B[0])
result = [[0]*p for _ in range(m)]
for i in range(m):
Ai = A[i]
for k in range(n):
aik = Ai[k]
if aik:
Bk = B[k]
for j in range(p):
result[i][j] += aik * Bk[j]
return result
def matmul(A, B):
...
def matmul(A, B):
# selects optimal tiling size
...
# uses bit manipulation to accelerate computations
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
assert p == len(A[0])
# build transposed B for faster row dot
Bt = [list(col) for col in zip(*B)]
def matmul(A, B):
return ...
def matmul(A, B):
...
def matmul(A, B):
"""
Multiply two matrices A and B, where A and B are lists of lists.
A should be of shape (m, n) and B of shape (n, p); the result is an
(m, p) matrix.
Example
-------
A = [[1, 2], [3, 4]]
B = [[5, 6], [7, 8]]
>>> matmul(A, B)
[[19, 22], [43, 50]]
"""
# Ensure input is rectangular
if not A or not B or not A[0] or not B[0]:
return []
n_rows_A = len(A)
n_cols_A = len(A[0]) # number of columns in A
n_rows_B = len(B)
n_cols_B = len(B[0]) # number of columns in B
if n_cols_A != n_rows_B:
raise ValueError("Number of columns of A must equal number of rows of B")
# Pre‑allocate the result matrix
result = [[0] * n_cols_B for _ in range(n_rows_A)]
# For better cache performance we iterate over columns of B only once
for i in range(n_rows_A):
ai = A[i] # local reference
for k in range(n_cols_A):
aik = ai[k]
if aik == 0:
continue # skip zero entries for a tiny speed bump
bk_row = B[k]
for j in range(n_cols_B):
result[i][j] += aik * bk_row[j]
return result
def matmul(A, B):
# get dims
ra = len(A); ca= len(A[0]) if A else 0
rb = len(B); cb= len(B[0]) if B else 0
# check dims
if ca != rb: raise ValueError
# transpose B
B_T = [[B[i][j] for i in range(rb)] for j in range(cb)]
# compute result
return [[sum(a*b for a,b in zip(A[i], B_T[j])) for j in range(cb)] for i in range(ra)]
def matmul(A, B):
m = len(A)
n = len(B[0])
p = len(B)
# B transpose to improve locality
B_T = list(zip(*B))
out = [[0]*n for _ in range(m)]
for i in range(m):
Ai = A[i]
for j in range(n):
sum = 0
Bj = B_T[j]
for k in range(p):
sum += Ai[k]*Bj[k]
out[i][j] = sum
return out
def matmul(A, B): return ...
def matmul(A, B):
m = len(A)
n = len(B[0])
p = len(A[0])
assert p == len(B)
result = [[0]*n for _ in range(m)]
for i in range(m):
for k in range(p):
aik = A[i][k]
for j in range(n):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
# A: list of lists (m x n), B: list of lists (n x p)
# returns list of lists (m x p)
m, n = len(A), len(A[0])
p = len(B[0])
# verify dimensions
# compute product
result = [[0] * p for _ in range(m)]
for i in range(m):
for k in range(n):
aik = A[i][k]
for j in range(p):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
return ...
def matmul(A, B):
# validate
m = len(A)
n = len(B[0]) # columns of result
# B must be of dimension (len(A[0]) x len(B[0]))
BT = list(zip(*B)) # Transposed B
result = [[sum(x*y for x, y in zip(row, col)) for col in BT] for row in A]
return result
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
C = [[0]*m for _ in range(n)]
for i in range(n):
ai = A[i]
ci = C[i]
for k in range(p):
aik = ai[k]
if aik:
bk = B[k]
for j in range(m):
ci[j] += aik * bk[j]
return C
def matmul(A, B):
return ...
def matmul(A, B):
"""
Multiply two matrices represented as lists of lists using only native Python code.
Parameters
----------
A : list of list of numbers
The left matrix (m x n).
B : list of list of numbers
The right matrix (n x p).
Returns
-------
list of list of numbers
The product matrix (m x p).
"""
# Sanity check: A must be compatible with B.
if not A or not B:
return []
m, n = len(A), len(A[0])
n2, p = len(B), len(B[0])
if n != n2:
raise ValueError("Inner dimensions of A and B must match")
# Pre‑allocate the result matrix with zeros.
C = [[0] * p for _ in range(m)]
# Standard triple‑loop algorithm
for i in range(m):
ai = A[i]
ci = C[i]
for k in range(n):
aik = ai[k]
if aik == 0:
continue
bk = B[k] # Row of B affected
for j in range(p):
ci[j] += aik * bk[j]
return C
def matmul(A, B):
# Check dimensions
if not A: return []
m, n = len(A), len(A[0]) # number of rows in A and columns in A
if not B or len(B) != n: raise ValueError("Size mismatch")
p = len(B[0]) # number of columns in B
# Precompute columns of B
B_cols = [[B[row][col] for row in range(n)] for col in range(p)]
result = [[sum(a_elem * b_elem for a_elem, b_elem in zip(row, col)) for col in B_cols] for row in A]
return result
def matmul(A, B):
...
def matmul(A, B):
n = len(A)
m = len(B[0]) # number of columns of B
common = len(B)
# initialize result matrix
res = [[0]*m for _ in range(n)]
for i in range(n):
for k in range(common):
aik = A[i][k]
for j in range(m):
res[i][j] += aik * B[k][j]
return res
def matmul(A, B):
# A is m x n, B is n x p,
# output shape m x p
m = len(A)
n = len(A[0]) if A else 0
# check B dims
if n == 0:
return []
p = len(B[0])
# preallocate result
C = [[0]*p for _ in range(m)]
for i in range(m):
Ai = A[i]
Ci = C[i]
for k in range(n):
aik = Ai[k]
if aik:
Bk = B[k]
# if aik != 0 multiply
for j in range(p):
Ci[j] += aik * Bk[j]
return C
def matmul(A, B):
"""
Multiply two matrices A and B.
Parameters
----------
A : list[list[int | float]]
The left matrix (m × k) to be multiplied.
B : list[list[int | float]]
The right matrix (k × n) to be multiplied.
Returns
-------
C : list[list[int | float]]
Result of the product A @ B (m × n matrix).
"""
# Basic checks
if not A or not B or not B[0]:
return []
m, k1 = len(A), len(A[0])
k2, n = len(B), len(B[0])
if k1 != k2:
raise ValueError("Inner dimensions of matrices must agree")
# Initialize the result matrix with zeros
C = [[0.0] * n for _ in range(m)]
# A naive but well‑structured implementation that is reasonably fast
for i in range(m):
ai = A[i]
ci = C[i]
for j in range(n):
s = 0.0
for p in range(k1):
s += ai[p] * B[p][j]
ci[j] = s
return C
def matmul(A, B):
"""
Multiply two matrices A and B using only plain Python code (no external libraries).
`A` and `B` are expected to be lists of lists, where each inner list represents a row.
This implementation uses a simple, straightforward algorithm with small optimisations
such as caching dimensions and avoiding repeated attribute lookups inside loops.
Note: This function expects that the number of columns in `A` matches the number of rows in `B`.
"""
# Validate dimensions
n_rows_A, n_cols_A = len(A), len(A[0])
n_rows_B, n_cols_B = len(B), len(B[0])
if n_cols_A != n_rows_B:
raise ValueError("Incompatible dimensions for matrix multiplication")
# Pre‑allocate result matrix
result = [[0] * n_cols_B for _ in range(n_rows_A)]
# Transpose B to improve cache locality
B_transposed = [[B[row][col] for row in range(n_rows_B)] for col in range(n_cols_B)]
# Perform multiplication (standard algorithm)
for i in range(n_rows_A):
row_A = A[i]
for j in range(n_cols_B):
sum_val = 0
row_B = B_transposed[j]
for k in range(n_cols_A):
sum_val += row_A[k] * row_B[k]
result[i][j] = sum_val
return result
def matmul(A, B):
# Makes sure the matrices can be multiplied
if len(A[0]) != len(B):
raise ValueError("Number of columns in A must equal number of rows in B")
# Initialise result matrix with zeros
rows_A, cols_B, cols_A = len(A), len(B[0]), len(A[0])
C = [[0] * cols_B for _ in range(rows_A)]
# Standard O(n³) matrix multiplication
for i in range(rows_A):
for k in range(cols_A):
aik = A[i][k]
# Skip if a[i][k] is zero to save some work
if aik == 0:
continue
for j in range(cols_B):
C[i][j] += aik * B[k][j]
return C
def matmul(A, B):
try:
_lenA = len(A)
_lenB = len(B)
if _lenA == 0 or _lenB == 0:
return []
m = len(A[0])
if any(len(row)!=m for row in A):
raise ValueError
n = len(B[0])
if any(len(row)!=n for row in B):
raise ValueError
if m != len(B):
raise ValueError("Incompatible dimensions")
except Exception:
raise
B_T = [tuple(col) for col in zip(*B)] # transpose
res = [ [0]*n for _ in range(_lenA) ]
for i in range(_lenA):
rowA = A[i]
result_row = res[i]
for j in range(n):
colB = B_T[j]
s = 0
for k in range(m):
s += rowA[k] * colB[k]
result_row[j] = s
return res
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
res = [[0]*m for _ in range(n)]
# Transpose B for faster column access
B_T = [list(col) for col in zip(*B)]
for i in range(n):
Ai = A[i]
for j in range(m):
res[i][j] = sum(a*b for a,b in zip(Ai, B_T[j]))
return res
def matmul(A, B):
# ensure dimensions compatible
n = len(A)
m = len(A[0])
p = len(B[0])
# a quick check for compatibility
if len(B) != m:
raise ValueError("Incompatible matrix dimensions.")
# initialize result matrix
result = [[0]*p for _ in range(n)]
# naive multiplication
for i in range(n):
for j in range(p):
sum_val = 0
for k in range(m):
sum_val += A[i][k] * B[k][j]
result[i][j] = sum_val
return result
def matmul(A, B):
n = len(A)
m = len(B[0])
common = len(B)
# compute product
res = [[0]*m for _ in range(n)]
for i in range(n):
ai = A[i]
row_res = res[i]
for k in range(common):
aik = ai[k]
bk = B[k]
for j in range(m):
row_res[j] += aik * bk[j]
return res
def matmul(A, B):
# Check input
nA = len(A)
mA = len(A[0]) if A else 0
nB = len(B)
mB = len(B[0]) if B else 0
# dims
if mA != nB:
raise ValueError("Incompatible dimensions.")
# result dims
result = [[0]*mB for _ in range(nA)]
for i in range(nA):
for j in range(mB):
s = 0
for k in range(mA):
s += A[i][k] * B[k][j]
result[i][j] = s
return result
def matmul(A, B):
"""
Multiply two matrices A and B (given as lists of lists).
Parameters
----------
A : list[list[float]]
The first matrix (m x n).
B : list[list[float]]
The second matrix (n x p).
Returns
-------
C : list[list[float]]
The product matrix (m x p).
Raises
-------
ValueError
If the inner dimensions don't match.
"""
# Basic sanity check on dimensions
if not A or not B or not A[0] or not B[0]:
raise ValueError("Matrices must have non‑empty dimensions.")
n = len(A[0]) # number of columns in A
if any(len(row) != n for row in A):
raise ValueError("All rows in A must have the same length.")
if len(B) != n:
raise ValueError("Number of columns in A must equal number of rows in B.")
m = len(A) # number of rows in A
p = len(B[0]) # number of columns in B
# Pre‑compute columns of B for quicker access
B_t = [tuple(col) for col in zip(*B)] # transpose: each column is a tuple
# Compute each entry of the product, using Python's max‑speed loops
C = [[sum(a * b for a, b in zip(A[i], col)) for col in B_t] for i in range(m)]
return C
def matmul(A, B):
"""
Multiply two matrices A and B.
A and B are lists of lists of numbers (i.e. 2D arrays).
Returns the result as a new list of lists.
"""
# Ensure matrix dimensions are compatible
if len(A[0]) != len(B):
raise ValueError("Incompatible matrix dimensions for multiplication.")
# Initialize the result matrix with zeros
result = [[0]*len(B[0]) for _ in range(len(A))]
# Perform multiplication
for i in range(len(A)):
for j in range(len(B[0])):
for k in range(len(A[0])):
result[i][j] += A[i][k] * B[k][j]
return result
def matmul(A, B):
# Verify that matrices have compatible dimensions
if not A or not B or len(A[0]) != len(B):
raise ValueError("Number of columns in A must equal number of rows in B.")
# Initialize the result matrix (size: rows of A × columns of B)
n_rows_a = len(A)
n_cols_b = len(B[0])
result = [[0] * n_cols_b for _ in range(n_rows_a)]
# Iterate through rows of A and columns of B, accumulating the dot products
for i in range(n_rows_a):
for j in range(n_cols_b):
sum_val = 0
for k in range(len(A[0])):
sum_val += A[i][k] * B[k][j]
result[i][j] = sum_val
return result
def matmul(A, B):
# assume dims: A: n x m, B: m x p
n = len(A)
m = len(A[0]) if A else 0
p = len(B[0]) if B else 0
# initialize result matrix
C = [[0]*p for _ in range(n)]
for i in range(n):
rowA = A[i]
rowC = C[i]
for k in range(m):
a = rowA[k]
if a != 0: # optional optimization
colB = B[k]
for j in range(p):
rowC[j] += a * colB[j]
return C
def matmul(A, B):
n = len(A)
Bt = [[B[k][j] for k in range(n)] for j in range(n)]
res = [[0] * n for _ in range(n)]
for i in range(n):
row = A[i]
for j in range(n):
res[i][j] = sum(row[k] * Bt[j][k] for k in range(n))
return res
def matmul(A, B):
return ...
def matmul(A, B): return ...
def matmul(A, B):
# Number of rows in A, number of columns in A (and rows in B), number of columns in B
n, m, p = len(A), len(A[0]), len(B[0])
# Prepare the result matrix with zeros
C = [[0] * p for _ in range(n)]
# Perform the standard O(n*m*p) multiplication
for i in range(n):
for k in range(m):
aik = A[i][k] # element in A at row i, column k
for j in range(p):
C[i][j] += aik * B[k][j]
return C
def matmul(A, B):
"""
Multiply two matrices A and B where A is an m×n matrix and B is an n×p matrix.
Returns the resulting m×p matrix.
"""
# Number of rows in A
rows_a = len(A)
# Number of columns in A (required to multiply with B)
cols_a = len(A[0]) if A else 0
# Number of columns in B
cols_b = len(B[0]) if B else 0
# Quick check a few edge cases
if rows_a == 0 or cols_a == 0 or cols_b == 0:
return []
# Ensure that dimension compatibility holds
if len(B) != cols_a:
raise ValueError("Number of columns in A must equal number of rows in B")
# Prepare the result matrix
result = [[0.0 for _ in range(cols_b)] for _ in range(rows_a)]
# Matrix multiplication
for i in range(rows_a):
for k in range(cols_a):
aik = A[i][k]
for j in range(cols_b):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B): return ...
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
# assert len(A[0]) == p
C = [[0]*m for _ in range(n)]
# iterate
for i in range(n):
Ci = C[i]
Ai = A[i]
for k in range(p):
a = Ai[k]
if a != 0:
Bk = B[k]
for j in range(m):
Ci[j] += a * Bk[j]
return C
def matmul(A, B):
# A: list of list, columns: n x m
# B: list of list, dimensions: m x p
# returns C: n x p
n = len(A)
m = len(A[0])
p = len(B[0])
# initialize result matrix
C = [[0]*p for _ in range(n)]
for i in range(n):
ai = A[i]
for k in range(m):
aik = ai[k]
if aik:
bk = B[k]
for j in range(p):
C[i][j] += aik * bk[j]
return C
def matmul(A, B):
"""
Multiply two matrices A and B using only native Python code.
Parameters:
- A: List of lists where each sublist represents a row of matrix A.
- B: List of lists where each sublist represents a row of matrix B.
Returns:
- Resulting matrix as a list of lists.
"""
if not A or not B:
return []
n_rows_A = len(A)
n_cols_A = len(A[0]) if n_rows_A > 0 else 0
n_rows_B = len(B)
n_cols_B = len(B[0]) if n_rows_B > 0 else 0
# Ensure the matrices can be multiplied
if n_cols_A != n_rows_B:
raise ValueError("Number of columns in A must equal number of rows in B")
# Initialize result matrix with zeros
result = [[0] * n_cols_B for _ in range(n_rows_A)]
# Perform matrix multiplication
for i in range(n_rows_A):
for k in range(n_cols_A):
aik = A[i][k]
if aik == 0:
continue
for j in range(n_cols_B):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
"""
Multiply two matrices A and B using vanilla Python.
Parameters
----------
A : list of lists of numbers, shape (m, k)
Left matrix.
B : list of lists of numbers, shape (k, n)
Right matrix.
Returns
-------
C : list of lists of numbers, shape (m, n)
The product A @ B.
"""
m = len(A) # rows of A
k = len(A[0]) # columns of A / rows of B
n = len(B[0]) # columns of B
# Initialize the residual matrix with zeros.
C = [[0]*n for _ in range(m)]
# Perform the multiplication using the standard triple nested loop.
for i in range(m):
Ai = A[i] # row of A
Ci = C[i] # row of C we will fill
for l in range(k): # over columns of A and rows of B
a = Ai[l]
if a != 0: # skip zero terms to reduce work
Bl = B[l]
for j in range(n):
Ci[j] += a * Bl[j]
return C
def matmul(A, B):
"""
Multiply two matrices A and B (given as lists of lists) without using external libraries.
"""
# Basic dimension checks
if not A or not B:
return []
m, p = len(A), len(A[0]) # A is m×p
assert len(B) == p # B must be p×n
n = len(B[0]) # n columns in B
# Transpose B to improve cache locality
B_T = list(zip(*B)) # B^T is n×p, each element is a tuple
result = [[0]*n for _ in range(m)]
for i in range(m):
row_A = A[i]
row_res = result[i]
for k in range(p): # iterate over inner dimension
aik = row_A[k]
if aik:
col_B = B_T[k] # k-th column of B
for j in range(n):
row_res[j] += aik * col_B[j]
return result
def matmul(A, B):
"""Multiply two matrices A and B.
The function assumes that A and B are compatible for matrix multiplication,
i.e. if A is MxK then B must be KxN. The result is an MxN matrix.
Parameters
----------
A : list[list[float]]
MxK matrix.
B : list[list[float]]
KxN matrix.
Returns
-------
list[list[float]]
Product matrix of shape MxN.
"""
# Grab sizes locally for speed
m = len(A) # number of rows of A
k = len(A[0]) if A else 0 # number of columns in A (inner dimension)
n = len(B[0]) if B else 0 # number of columns in B
# Prepare the result matrix.
# Use a list of lists pre‑filled with zeros.
result = [[0.0] * n for _ in range(m)]
# Basic algorithm – triple nested loop.
for i in range(m):
Ai = A[i]
Ri = result[i]
# Pre‑localize B for a bit of speed.
for j in range(n):
s = 0.0
for p in range(k):
s += Ai[p] * B[p][j]
Ri[j] = s
return result
def matmul(A, B):
"""
Multiply two matrices A and B where A is mxk and B is kxn
using simple Python list-of-lists and a few optimizations.
Arguments:
A: list of lists, where A[i][j] is the entry of row i and column j
B: list of lists, where B[i][j] is the entry of row i and column j
Returns:
C: the resulting matrix (mxn)
"""
m, k = len(A), len(A[0]) # size of A: m rows, k columns
k2, n = len(B), len(B[0]) # size of B: k' rows, n columns
if k != k2:
raise ValueError("A and B have incompatible dimensions")
# To speed up innermost loops we transpose B.
B_T = [[B[row][col] for row in range(k)] for col in range(n)]
# Initialize output matrix
C = [[0] * n for _ in range(m)]
for i in range(m):
a_row = A[i]
for j in range(n):
# compute dot product of A[i] and B_T[j] as one row
acc = 0
for t in range(k):
acc += a_row[t] * B_T[j][t]
C[i][j] = acc
return C
def matmul(A, B):
"""Fast matrix multiplication using only native Python code.
The function expects A (m x n) and B (n x p) to be lists of lists.
It uses a single loop to compute the result efficiently.
Time complexity: O(m*n*p) in the worst case, but many Python
implementations can handle small matrices quickly.
"""
m, n = len(A), len(A[0])
nB, p = len(B), len(B[0])
if n != nB:
raise ValueError("A's column count must equal B's row count")
# Initialize a zero matrix for the result
C = [[0.0] * p for _ in range(m)]
# Main multiplication loop
for i in range(m):
for k in range(n):
aik = A[i][k]
if aik == 0:
continue
for j in range(p):
C[i][j] += aik * B[k][j]
return C
def matmul(A, B):
m, n = len(A), len(A[0]) # n must equal len(B)
p = len(B[0])
return [[sum(a*b for a,b in zip(row, col)) for col in zip(*B)] for row in A]
def matmul(A, B):
# check dimensions
n = len(A)
m = len(B[0])
# Transpose B
B_T = list(zip(*B))
result = [ [0]*m for _ in range(n) ]
for i, row in enumerate(A):
for j, col in enumerate(B_T):
result[i][j] = sum(a*b for a,b in zip(row, col))
return result
def matmul(A, B):
n = len(A)
m = len(A[0])
p = len(B[0])
result = [[0] * p for _ in range(n)]
for i in range(n):
ai = A[i]
for k in range(m):
aik = ai[k]
if aik:
bj = B[k]
for j in range(p):
result[i][j] += aik * bj[j]
return result
def matmul(A, B):
n = len(A)
m = len(B[0]) # columns of B
p = len(B) # rows of B
# ensure A's columns equal B's rows
assert len(A[0]) == len(B), "Incompatible dimensions"
result = [[0] * m for _ in range(n)]
for i in range(n):
for k in range(len(B)):
aik = A[i][k]
# Multiply aik with each value in row k of B
for j in range(m):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
return ...
def matmul(A, B):
# Validate shapes
n = len(A)
m = len(A[0]) # width of A
p = len(B[0])
# maybe check that all rows are same length
# Also, ensure len(B) == m
if len(B) != m:
raise ValueError('Incompatible matrix shapes for multiplication.')
# Use maybe list comprehension for each element
result = [[sum(A[i][k]*B[k][j] for k in range(m)) for j in range(p)] for i in range(n)]
return result
def matmul(A, B): return ...
def matmul(A, B):
"""
Multiply two matrices A and B provided as lists of lists.
A and B should be compatible for matrix multiplication
(number of columns of A equals number of rows of B).
Args:
A: List of rows, where each row is an iterable of numbers and
all rows have the same length.
B: Same format.
Returns:
A new list of lists containing the product A * B.
"""
# Validate dimensions
if not A or not B:
raise ValueError("Input matrices cannot be empty")
n_rows_a = len(A)
n_cols_a = len(A[0])
n_rows_b = len(B)
n_cols_b = len(B[0])
if n_cols_a != n_rows_b:
raise ValueError("Incompatible dimensions for matrix multiplication")
# Transpose B once to improve cache locality
B_T = [[B[row][col] for row in range(n_rows_b)] for col in range(n_cols_b)]
# Allocate result matrix
result = [[0] * n_cols_b for _ in range(n_rows_a)]
# Perform multiplication
for i in range(n_rows_a):
row_a = A[i]
row_res = result[i]
for j in range(n_cols_b):
col_b = B_T[j]
s = 0
# dot product of row_a and col_b
for k in range(n_cols_a):
s += row_a[k] * col_b[k]
row_res[j] = s
return result
def matmul(A, B):
n, m = len(A), len(B[0])
# assume A rows by k, B columns by k
# B's columns: B_col = [list of column values]
B_T = list(zip(*B))
C = []
for row in A:
newrow = []
for col in B_T:
newrow.append(sum(a*b for a,b in zip(row, col)))
C.append(newrow)
return C
def matmul(A, B):
# Check dimensions
m, n = len(A), len(A[0]); p, q = len(B), len(B[0])
if n != p:
raise ValueError("Incompatible dimensions.")
result = [[0]*q for _ in range(m)]
for i in range(m):
for k in range(n):
aik = A[i][k]
for j in range(q):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
"""Multiply two matrices A (n x m) and B (m x p) using only native Python."""
n = len(A)
m = len(A[0]) if A else 0
p = len(B[0]) if B else 0
# Verify dimensions
if not A or not B or len(B) != m:
raise ValueError("Incompatible matrix dimensions.")
# Pre-allocate result matrix
result = [[0] * p for _ in range(n)]
for i in range(n):
Ai = A[i]
res_row = result[i]
for k in range(m):
aik = Ai[k]
if aik:
Bk = B[k]
for j in range(p):
res_row[j] += aik * Bk[j]
return result
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
C = [[0.0]*m for _ in range(n)]
for i in range(n):
a_row = A[i]
for k in range(p):
aik = a_row[k]
if aik!=0:
B_col = B[k]
for j in range(m):
C[i][j] += aik * B_col[j]
return C
def matmul(A, B):
n = len(A)
m = len(B[0])
# B must have dimension matching
# Let's compute transpose of B for cache-friendly.
BT = list(map(list, zip(*B))) # transpose B
result = [[sum(a*b for a,b in zip(rowA, colB)) for colB in BT]
for rowA in A]
return result
def matmul(A, B):
if not A or not B: return []
m, n = len(A), len(A[0])
p = len(B[0])
# check B's rows equal to A's columns
if len(B) != n: raise ValueError("...")
result = [[0.0]*p for _ in range(m)]
for i in range(m):
ai = A[i]
for k, a in enumerate(ai):
if a:
bk = B[k]
for j, b in enumerate(bk):
result[i][j] += a*b
return result
def matmul(A, B):
n = len(A)
m = len(A[0])
p = len(B[0])
def matmul(A, B): return ...
def matmul(A, B): return ...
def matmul(A, B):
n = len(A)
C = [[0]*n for _ in range(n)]
# naive loop
for i in range(n):
for k in range(n):
for j in range(n):
C[i][j] += A[i][k] * B[k][j]
return C
def matmul(A, B):
m=len(A)
n=len(A[0])
p=len(B[0])
C=[[0]*p for _ in range(m)]
for i in range(m):
for k in range(n):
for j in range(p):
C[i][j]+=A[i][k]*B[k][j]
return C
def matmul(A, B):
# A: m x k, B: k x n -> result m x n
m = len(A)
k = len(A[0]) if A else 0
n = len(B[0]) if B else 0
# Precompute B's column representation
Bt = list(zip(*B)) # columns of B as tuples
return [[sum(a*b for a,b in zip(row, col)) for col in Bt] for row in A]
def matmul(A, B):
import numpy as np
A_arr = np.array(A)
B_arr = np.array(B)
return (A_arr @ B_arr).tolist()
def matmul(A, B):
n=len(A); m=len(B[0]); p=len(B) # that is dimension of B's rows
# use zeros list for result
result=[[0]*m for _ in range(n)]
for i in range(n):
a_row=A[i]
for j in range(m):
s=0
for k in range(p):
s+=a_row[k]*B[k][j]
result[i][j]=s
return result
def matmul(A, B):
return ...
def matmul(A, B):
"""
Multiply two matrices A and B.
A is (m × n) and B is (n × p), both represented as lists of lists.
Returns the product matrix of dimension (m × p).
This routine is written purely in Python and uses a little bit of
pre‑processing to keep memory accesses cache‑friendly.
"""
# dimensions
m, n = len(A), len(A[0])
nB, p = len(B), len(B[0])
assert n == nB, "Inner dimensions must agree"
# transpose B to keep the inner loop cache‑friendly
B_T = [[B[k][j] for k in range(n)] for j in range(p)]
# prepare result matrix
C = [[0] * p for _ in range(m)]
for i in range(m):
Ai = A[i]
Ci = C[i]
for k in range(n):
aik = Ai[k]
if aik == 0:
continue
Bk = B_T[k]
for j in range(p):
Ci[j] += aik * Bk[j]
return C
def matmul(A, B):
"""
Multiply two matrices A and B using only native Python code.
Parameters:
A: list of lists, shape (m, n)
B: list of lists, shape (n, p)
Returns:
C: list of lists, shape (m, p)
"""
# check dimensions
m, n = len(A), len(A[0])
if len(B) != n:
raise ValueError("Inner matrix dimensions do not match")
p = len(B[0])
# pre‑allocate the result matrix
C = [[0] * p for _ in range(m)]
# multiply
for i in range(m):
ai = A[i]
for j in range(p):
s = 0
# iterate over the shared dimension
for k in range(n):
s += ai[k] * B[k][j]
C[i][j] = s
return C
def matmul(A, B):
m, n_A = len(A), len(A[0])
n_B, p = len(B), len(B[0])
assert n_A == n_B # The matrices must have compatible dimensions
# Preallocate the output matrix
result = [[0.0] * p for _ in range(m)]
for i in range(m):
for j in range(p):
# Compute the dot product of row i and column j
sum_val = 0.0
for k in range(n_A): # or n_B
sum_val += A[i][k] * B[k][j]
result[i][j] = sum_val
return result
def matmul(A, B):
n = len(A)
# Assume A and B are both n x n
# If n=1 return element-wise product
# else compute.
def matmul(A, B):
"""
Multiplies two matrices A and B.
Parameters
----------
A : list of lists
First matrix, with dimensions m x n.
B : list of lists
Second matrix, with dimensions n x p.
Returns
-------
list of lists
Resulting matrix of dimensions m x p.
"""
# Number of rows in A and columns in B
m = len(A)
n = len(A[0]) # Shared dimension
p = len(B[0]) # Columns in B
# Prepare result matrix filled with zeros
C = [[0 for _ in range(p)] for _ in range(m)]
# Standard triple‑loop multiplication
for i in range(m):
for k in range(n):
aik = A[i][k]
for j in range(p):
C[i][j] += aik * B[k][j]
return C
def matmul(A, B):
n_rows_A = len(A)
n_cols_A = len(A[0]) if A else 0
n_rows_B = len(B)
n_cols_B = len(B[0]) if B else 0
if n_cols_A != n_rows_B:
raise ValueError("Incompatible dimensions")
C = [[0]*n_cols_B for _ in range(n_rows_A)]
for i in range(n_rows_A):
Ai = A[i]
Ci = C[i]
for k in range(n_cols_A):
aik = Ai[k]
if aik:
Bk = B[k]
for j in range(n_cols_B):
Ci[j] += aik * Bk[j]
return C
def matmul(A, B):
m, n = len(A), len(B[0])
# etc...
# compute product
result = [[0]*n for _ in range(m)]
for i in range(m):
for k in range(len(A[0])): # iterate columns of A
aik = A[i][k]
for j in range(n):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
import math
n = len(A)
m = len(A[0])
# Validate B shape: m==len(B)
assert m == len(B), "Incompatible mat shape"
# optionally convert to square by padding with zeros for Strassen
def matmul(A, B):
n = len(A)
m = len(A[0]) # number of columns in A
p = len(B[0])
# Pre-allocate result matrix
C = [[0]*p for _ in range(n)]
for i in range(n):
row = A[i]
for j in range(p):
s = 0
for k in range(m):
s += row[k]*B[k][j]
C[i][j] = s
return C
def matmul(A, B):
n = len(A)
m = len(B[0]) # columns of B
result = [[0]*m for _ in range(n)]
for i in range(n):
for k in range(len(A[i])):
aik = A[i][k]
if aik != 0:
for j in range(m):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):\n return ...
def matmul(A, B):
n, m = len(A), len(A[0])
o, p = len(B), len(B[0])
assert m == o
# Optionally use comprehension:
result = [[sum(A[i][k]*B[k][j] for k in range(m)) for j in range(p)] for i in range(n)]
return result
def matmul(A, B):
"""
Multiply two matrices A and B.
Parameters
----------
A: list of lists (rows of A)
B: list of lists (rows of B)
Returns
-------
Resulting matrix as a list of lists.
"""
# Determine dimensions
m = len(A) # number of rows in A
k = len(A[0]) if A else 0 # number of columns in A
n = len(B[0]) # number of columns in B
# Initialize result matrix
C = [[0] * n for _ in range(m)]
# Standard triple-loop matrix multiplication
for i in range(m):
for j in range(n):
s = 0
for l in range(k):
s += A[i][l] * B[l][j]
C[i][j] = s
return C
def matmul(A, B):
# sizes: A is n×p , B is p×m
n, p = len(A), len(A[0]) # number of rows of A, columns of A
mp = len(B[0]) # number of columns of B
# prepare result matrix n×m and fill it
R = [[0] * mp for _ in range(n)]
for i in range(n):
for j in range(mp):
s = 0
for k in range(p):
s += A[i][k] * B[k][j]
R[i][j] = s
return R
def matmul(A, B):
return ...
def matmul(A, B):
n, m = len(A), len(B[0])
...
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
result = [[0]*m for _ in range(n)]
for i in range(n):
for j in range(m):
s = 0
for k in range(p):
s += A[i][k] * B[k][j]
result[i][j] = s
return result
def matmul(A, B):
n = len(A)
m = len(B[0])
p = len(B)
result = [[0]*m for _ in range(n)]
for i in range(n):
for k in range(p):
aik = A[i][k]
if aik:
row = result[i]
for j in range(m):
row[j] += aik * B[k][j]
return result
def matmul(A, B):
...
def matmul(A, B):
if not A or not B: return []
m, n1 = len(A), len(A[0])
n2, p = len(B), len(B[0])
assert n1 == n2, "Dimensions mismatch."
res = [[0]*p for _ in range(m)]
for i in range(m):
for k in range(n1):
a = A[i][k]
if a:
for j in range(p):
res[i][j] += a*B[k][j]
return res
def matmul(A, B):
"""
Multiply two matrices A and B using a fast algorithm (divide‑and‑conquer).
The algorithm recursively multiplies submatrices. This implementation
does not use any external libraries (no numpy etc.) and only relies on
pure Python data structures.
Parameters
----------
A : list of list of numbers
The left matrix.
B : list of list of numbers
The right matrix.
Returns
-------
list of list of numbers
The product matrix A * B.
The function assumes that A's columns equal B's rows.
"""
# Determine matrix dimensions
n = len(A) # rows of A
m = len(B[0]) # columns of B
k = len(B) # columns of A / rows of B
# A and B must be compatible
if not all(len(row) == k for row in A):
raise ValueError("Incompatible dimensions for matrix multiplication")
# Base case: if matrices are small, compute directly
if n <= 1 or k <= 1 or m <= 1:
# Direct quadratic multiplication
result = [[0.0] * m for _ in range(n)]
for i in range(n):
for j in range(m):
for s in range(k):
result[i][j] += A[i][s] * B[s][j]
return result
# Divide matrices into quadrants
# Helper function to split matrix into four sub‑matrices
def split(mat):
half_rows = len(mat) // 2
half_cols = len(mat[0]) // 2
top_left = [row[:half_cols] for row in mat[:half_rows]]
top_right = [row[half_cols:] for row in mat[:half_rows]]
bottom_left = [row[:half_cols] for row in mat[half_rows:]]
bottom_right = [row[half_cols:] for row in mat[half_rows:]]
return top_left, top_right, bottom_left, bottom_right
# Split A
A11, A12, A21, A22 = split(A)
# Split B
B11, B12, B21, B22 = split(B)
# Recursive multiplication for each submatrix product
C11 = matmul(A11, B11)
C12 = matmul(A12, B12)
C21 = matmul(A21, B21)
C22 = matmul(A22, B22)
# Combine the submatrices into a single result
result = []
for i in range(len(C11)):
result.append(C11[i] + C12[i]) # merge rows from left and right halves
for i in range(len(C21)):
result.append(C21[i] + C22[i])
return result
def matmul(A, B):
if not A or not B or not B[0]:
return []
m, p = len(A), len(A[0])
p2, n = len(B), len(B[0])
if p != p2:
raise ValueError("A's columns must equal B's rows")
# initialize result matrix
result = [[0]*n for _ in range(m)]
for i in range(m):
Ai = A[i]
for k in range(p):
aik = Ai[k]
if aik:
Bk = B[k]
for j in range(n):
result[i][j] += aik * Bk[j]
return result
def matmul(A, B):
# A is m x n, B is n x p
if not A or not B: return []
n = len(A[0])
assert all(len(row)==n for row in A)
assert all(len(row)==len(A[0]) for row in B)
m=len(A); p=len(B[0])
result=[[0]*p for _ in range(m)]
for i in range(m):
for k in range(n):
a=A[i][k]
if a:
# inner multiplication contributed to each column j
for j in range(p):
result[i][j] += a*B[k][j]
return result
def matmul(A, B):
...
def matmul(A, B):
...
def matmul(A, B):
# Sanity check for proper dimensions
n = len(A) # Rows in A
m = len(A[0]) if A else 0 # Columns in A (and rows in B)
p = len(B[0]) if B else 0 # Columns in B
# We need A's columns to equal B's rows
if m != len(B):
raise ValueError("Dimension mismatch: A's columns must equal B's rows")
# Initialize the result matrix
result = [[0] * p for _ in range(n)]
# Perform the multiplication
for i in range(n):
for k in range(m):
aik = A[i][k] # Cache the value to avoid repeated indexing
for j in range(p):
result[i][j] += aik * B[k][j]
return result
def matmul(A, B):
# asserts
def matmul(A, B):
... # code
TrainOutput(global_step=100, training_loss=3.8481087469079966e-05, metrics={'train_runtime': 46507.4754, 'train_samples_per_second': 0.004, 'train_steps_per_second': 0.002, 'total_flos': 0.0, 'train_loss': 3.8481087469079966e-05}) <|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.
Knowledge cutoff: 2024-06
Current date: 2025-11-22
Reasoning: low
# Valid channels: analysis, commentary, final. Channel must be included for every message.
Calls to these tools must go to the commentary channel: 'functions'.<|end|><|start|>user<|message|>Create a new fast matrix multiplication function using only native Python code.
You are given a list of list of numbers.
Output your new function in backticks using the format below:
```python
def matmul(A, B):
return ...
```<|end|><|start|>assistant<|channel|>analysis<|message|>We need to output a function that multiplies two matrices, using only native Python. Maybe naive algorithm. Provide code.<|end|><|start|>assistant<|channel|>final<|message|>```python
def matmul(A, B):
"""
Multiply two matrices A and B (as lists of lists) using the standard O(n^3) algorithm.
Assumes that A and B are compatible for multiplication (i.e., len(A[0]) == len(B)).
"""
# Dimensions
m = len(A) # number of rows in A
n = len(B[0]) # number of columns in B
p = len(B) # number of rows in B
# Initialize result matrix with zeros
result = [[0 for _ in range(n)] for _ in range(m)]
# Perform multiplication
for i in range(m):
for k in range(p):
aik = A[i][k]
for j in range(n):
result[i][j] += aik * B[k][j]
return result
```<|return|>
Saving to float16 or MXFP4 for VLLM
We also support saving to float16 directly. Select merged_16bit for float16 or mxfp4 for MXFP4 (OpenAI's GPT-OSS native precision). We also allow lora adapters as a fallback. Use push_to_hub_merged to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens. See our docs for more deployment options.
And we're done! If you have any questions on Unsloth, we have a Discord channel! If you find any bugs or want to keep updated with the latest LLM stuff, or need help, join projects etc, feel free to join our Discord!
Some other resources:
- Train your own reasoning model - Llama GRPO notebook Free Colab
- Saving finetunes to Ollama. Free notebook
- Llama 3.2 Vision finetuning - Radiography use case. Free Colab
- See notebooks for DPO, ORPO, Continued pretraining, conversational finetuning and more on our documentation!
This notebook and all Unsloth notebooks are licensed LGPL-3.0.



