Notebooks
M
Microsoft
Phi 3 Finetune Lora Python

Phi 3 Finetune Lora Python

codemicrosoft-phi-cookbook03.Finetuning

Instruction for fine-tuning a Phi-3-mini model on Python code generation using LoRA via Hugging Face Hub

Installing and loading the libraries

[ ]
[ ]
[ ]
[ ]

Importing the libraries

[ ]

Setting Global Parameters

[ ]

Connect to Huggingface Hub

IMPORTANT: The upcoming section's execution will vary based on your code execution environment and the configuration of your API Keys.

Interactive login to Hugging Face Hub is possible.

[ ]

Alternatively, you can supply a .env file that contains the Hugging Face token.

[ ]

Load the dataset with the instruction set

[ ]
[ ]
[ ]

Load the tokenizer to prepare the dataset

[ ]

Function to create the appropiate format for our model. We are going to adapt our dataset to the ChatML format.

[ ]

Apply the ChatML format to our dataset

The code block is used to prepare a dataset for training a chat model.

The dataset.map(create_message_column) line applies the create_message_column function to each example in the dataset. This function takes a row from the dataset and transforms it into a dictionary with a 'messages' key. The value of this key is a list of 'user' and 'assistant' messages.

The 'user' message is created by combining the 'instruction' and 'input' fields from the row, while the 'assistant' message is created from the 'output' field of the row. These messages are appended to the 'messages' list in the order of 'user' and 'assistant'.

The dataset_chatml.map(format_dataset_chatml) line then applies the format_dataset_chatml function to each example in the updated dataset. This function takes a row from the dataset and transforms it into a dictionary with a 'text' key. The value of this key is a string of formatted chat messages.

The tokenizer.apply_chat_template method is used to format the list of chat messages into a single string. The 'add_generation_prompt' parameter is set to False to avoid adding a generation prompt at the end of the string, and the 'tokenize' parameter is set to False to return a string instead of a list of tokens.

The result of these operations is a dataset where each example is a dictionary with a 'text' key and a string of formatted chat messages as its value. This format is suitable for training a chat model.

[ ]
[ ]

Split the dataset into a train and test sets

[ ]

Instruction fine-tune a Phi-3-mini model using LORA and trl

First, we try to identify out GPU

[ ]

Load the tokenizer and model to finetune

[ ]

Configure the LoRA properties

The SFTTrainer offers seamless integration with peft, simplifying the process of instruction tuning LLMs. All we need to do is create our LoRAConfig and supply it to the trainer. However, before initiating the training process, we must specify the hyperparameters we intend to use, which are defined in TrainingArguments.

[ ]

Establish Connection with wandb and Initiate the Project and Experiment

[ ]
[ ]

We now possess all the necessary components to construct our SFTTrainer and commence the training of our model.

[ ]

Initiate the model training process by invoking the train() method on our Trainer instance.

[ ]

Store the adapter on the Hugging Face Hu

[ ]

Merge the model and the adapter and save it

Combine the model and the adapter, then save it. It's necessary to clear the memory when operating on a T4 instance.

[ ]
[ ]
[ ]

Load the previously trained and stored model, combine it, and then save the complete model.

[ ]
[ ]

Model Inference and evaluation

For model inference and evaluation, we will download the model we created from the Hugging Face Hub and test it to ensure its functionality.

[ ]
[ ]

Retrieve the model and tokenizer from the Hugging Face Hub.

[ ]

We arrange the dataset in the same manner as before.

[ ]
[ ]

Create a text generation pipeline to run the inference

[ ]
[ ]

Develop a function that organizes the input and performs inference on an individual sample.

[ ]
[ ]

Evaluate the performance

[ ]

We'll employ the ROUGE metric to assess performance. While it may not be the optimal metric, it's straightforward and convenient to utilize.

[ ]

Develop a function for performing inference and assessing an instance.

[ ]

Now, we have the ability to execute inference on a collection of samples. For simplicity, the process isn't optimized at this stage. In the future, we plan to perform inference in batches to enhance performance. However, for the time being,

[ ]
[ ]

Now, we have the ability to compute the metric for the sample.

[ ]