Unsloth Zephyr (7B) DPO

Zephyr (7B) DPO

unsloth-notebooksunslothoriginal_template

alph-notebooks/unsloth-notebooks / Zephyr_(7B)-DPO.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

News

Placeholder

Installation

[ ]

Unsloth

[ ]

/usr/local/lib/python3.10/dist-packages/unsloth/__init__.py:67: UserWarning: CUDA is not linked properly.
We shall run `ldconfig /usr/lib64-nvidia` to try to fix it.
  warnings.warn(

[ ]

/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:72: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(

config.json:   0%|          | 0.00/1.04k [00:00<?, ?B/s]

==((====))==  Unsloth: Fast Mistral patching release 2024.1
   \\   /|    GPU: Tesla T4. Max memory: 14.748 GB
O^O/ \_/ \    CUDA capability = 7.5. Xformers = 0.0.22.post7. FA = False.
\        /    Pytorch version: 2.1.0+cu121. CUDA Toolkit = 12.1
 "-____-"     bfloat16 = FALSE. Platform = Linux

You passed `quantization_config` to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` attribute will be overwritten with the one you passed to `from_pretrained`.

model.safetensors:   0%|          | 0.00/4.13G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.48k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/624 [00:00<?, ?B/s]

[ ]

Data Prep

We follow Huggingface's Alignment Handbook for Zephyr and use the Ultra Feedback dataset, and sample 0.5% of it to speed things up. You can sample the full dataset for a full run.

[ ]

Downloading readme:   0%|          | 0.00/5.98k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/3.50M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/180M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/2.84M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/222M [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/7.12M [00:00<?, ?B/s]

Generating train_sft split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_sft split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_gen split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_gen split:   0%|          | 0/1000 [00:00<?, ? examples/s]

Generating train_prefs split:   0%|          | 0/61966 [00:00<?, ? examples/s]

Generating test_prefs split:   0%|          | 0/2000 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/309 [00:00<?, ? examples/s]

Formatting comparisons with prompt template (num_proc=12):   0%|          | 0/2000 [00:00<?, ? examples/s]

We shall print a random item from the dataset

[ ]

('<|system|>\n'
 '</s>\n'
 '<|user|>\n'
 'List two natural resources which was made in the factory.</s>\n'
 '<|assistant|>\n')
('Natural resources are not made in factories. Natural resources are materials '
 'and substances that occur naturally on Earth, such as water, minerals, '
 'forests, and fossil fuels. Factories typically produce man-made materials or '
 'process natural resources into finished products.</s>\n')
("I'm sorry, but it seems there might be some confusion in your question as "
 'natural resources are typically sourced from the earth or sea, and not made '
 'in a factory. However, factories often use natural resources to create '
 'various products. Two examples of natural resources that factories may use '
 'are crude oil and iron ore. Crude oil is refined to produce various '
 'petroleum products, such as gasoline and plastics, while iron ore is refined '
 'to create steel, which is used in the construction industry, vehicle '
 'manufacturing, and more. Does this help clarify things?</s>\n')

We now add LoRA adapters so we only need to update 1 to 10% of all parameters!

[ ]

Unsloth 2024.1 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.

Train the DPO model

Now let's train our model. We do 3 epochs on 0.5% of the dataset to speed things up.

[ ]

/usr/local/lib/python3.10/dist-packages/trl/trainer/dpo_trainer.py:294: UserWarning: When using DPODataCollatorWithPadding, you should set `remove_unused_columns=False` in your TrainingArguments we have set it for you, but you should do it yourself in the future.
  warnings.warn(

Map:   0%|          | 0/309 [00:00<?, ? examples/s]

[ ]

Unsloth: `use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`
Could not estimate the number of tokens of the input, floating-point operations will not be computed

TrainOutput(global_step=114, training_loss=0.397163258179238, metrics={'train_runtime': 4017.5003, 'train_samples_per_second': 0.231, 'train_steps_per_second': 0.028, 'total_flos': 0.0, 'train_loss': 0.397163258179238, 'epoch': 2.94})