Weights and Biases Train And Debug YOLOv5 Models With Weights & Biases

Train And Debug YOLOv5 Models With Weights & Biases

yolowandb-examplescolabs

alph-notebooks/wandb-examples / Train_and_Debug_YOLOv5_Models_with_Weights_&_Biases_.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Train and Debug YOLOv5 Models with Weights & Biases

In this colab, we'll demonstrate how to use the W&B integration with version 5 of the "You Only Look Once" (aka YOLOv5) real-time object detection framework to track model metrics, inspect model outputs, and restart interrupted runs.

Follow along with a video tutorial →

Setup

We begin by downloading the YOLOv5 GitHub repo and a dataset of chessboard images with labeled bounding boxes around the pieces. Below, we'll use this dataset to train a model to detect chess pieces in images.

We also install all the requirements for YOLOv5 and wandb.

[ ]

Detect

YOLOv5 provides highly-accurate, fast models that are pretrained on the Common Objects in COntext (COCO) dataset.

If your object detection application involves only classes from the COCO dataset, like "Stop Sign" and "Pizza", then these pretrained models may be all you need!

The cell below runs a pretrained model on an example image using detect.py from the YOLOv5 toolkit.

[ ]

Train

Chess pieces are not among the objects in COCO, so our pretrained models don't know how to detect them and we can't just use detect.py with one of those models.

Instead, we need to train the models to detect chess pieces, using YOLOv5's train.py. We don't have to start our models from scratch though! We can finetune the pretrained models on our chess piece dataset. This substantially speeds up training.

Model training is a complex process, so we'll want to track the inputs and outputs, log information about model behavior during training, and record system state and metrics. Additionally, we might

That's where Weights & Biases comes in: the wandb library provides all the tools you need to thoroughly and effectively log model training experiments.

YOLOv5 comes with wandb already integrated, so all you need to do is configure the logging with command line arguments.

--project sets the W&B project to which we're logging (akin to a GitHub repo).
--upload_dataset tells wandb to upload the dataset as a dataset-visualization Table. At regular intervals set by --bbox_interval, the model's outputs on the validation set will also be logged to W&B.
--save-period sets the number of epochs to wait in between logging the model checkpoints. If not set, only the final trained model is logged.

Even without these arguments, basic model metrics and some model outputs will still be saved to W&B.

Note: to use this same training and logging setup on a different dataset, just create a data.yaml for that dataset and provide it to the --data argument.

[ ]

Here's where you can find the uploaded evaluation results in the W&B UI:

Screenshot (22).png

Resume Crashed Runs

In addition to making it easier to debug our models, the W&B integration can help rescue crash or interrupted runs.

Two steps above helped set us up for this:

By setting a --save-period, we regularly logged the model to W&B, which means we can recreate our model and then resume the run on any device with the dataset available.
By using --upload_dataset, we logged the data to W&B, which means we can recreate the data as well and so resume runs on any device, whether the dataset is present on disk or not

To resume a crashed or interrupted run:

Go to that run's overview section on W&B dashboard
Copy the run path
Pass the run path as the --resume argument, plus the prefix wandb-artifact://. This prefix tells YOLO that the files are located on wandb, rather than locally.

Screenshot (19).png

crashed_run_path = "entity/project/run-id"  # your path here
!python train.py --resume wandb-artifact://{crashed_run_path}

End Notes

Distributed Data-Parallel Training

All YOLO+W&B features are DDP-aware and compatible. Train on as many GPUs as you can muster, and we'll keep logging!

Logging Large Datasets

For very large datasets, the initial dataset upload triggered by --log_dataset might be prohibitively expensive.

In that case, check out the log_dataset.py script included in YOLOv5.

`stripped` Models

At the end of training, a "stripped" version of the model is saved to W&B. This version of the model file is much smaller, but is missing accumulated data required for resuming training. It's intended for use in downstream inference.