Microsoft Phi 3 Inference Finetuning

Phi 3 Inference Finetuning

code04.Finetuningmicrosoft-phi-cookbook

alph-notebooks/microsoft-phi-cookbook / Phi_3_Inference_Finetuning.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

This Phi-3 Fine Tuning Notebook provides instructions on how to:

Fine-tune the Phi-3 mini model using QLoRA and LoRA techniques
Quantize the Phi-3 mini model using BitsandBytes and GPTQ for efficient memory usage
Execute the Phi-3 mini model using Hugging Face's Transformers library
Each section in this notebook is designed to be run independently, allowing you to focus on specific tasks as needed.

Fine-Tuning Phi-3

Welcome to the guide on fine-tuning the Phi-3 model. Phi-3 is a powerful language model developed by Microsoft, designed to generate human-like text based on the input it receives. Fine-tuning is a process that involves training a pre-trained model (like Phi-3) on a specific task, allowing it to adapt its pre-learned knowledge to the new task.

In this guide, we will walk you through the steps of fine-tuning the Phi-3 model. This process can help improve the model's performance on specific tasks or domains that were not covered in its original training data.

The fine-tuning process involves several steps, including setting up the environment, loading the pre-trained model, preparing the training data, and finally, training the model on the new data.

By the end of this guide, you should have a good understanding of how to fine-tune the Phi-3 model for your specific needs. Let's get started!

Inference

This section demonstrates how to run inference using the Phi-3 mini model with Hugging Face's Transformers library, specifically the 16-bit version.

[ ]

Using the original model (16-bit version)

It requires 7.4 GB of GPU RAM

[ ]

Code Generation

[ ]

Quantization

When the Phi-3 model is fine-tuned using Hugging Face's Transformers and subsequently quantized using 4-bit GPTQ, it necessitates 2.7 GB of GPU RAM.4

"Bitsandbytes NF4" is a specific configuration or method within the Bitsandbytes library, which is used for quantization. Quantization is a process that reduces the numerical precision of the weights in a model to make it smaller and faster.

[ ]

Phi-3 Fine-tuning

[ ]