Model Registry E2E
Model Registry Tutorial
The model registry is a central place to house and organize all the model tasks and their associated artifacts being worked on across an org:
- Model checkpoint management
- Document your models with rich model cards
- Maintain a history of all the models being used/deployed
- Facilitate clean hand-offs and stage management of models
- Tag and organize various model tasks
- Set up automatic notifications when models progress
This tutorial will walkthrough how to track the model development lifecycle for a simple image classification task.
🛠️ Install wandb
Login to W&B
- You can explicitly login using
wandb loginorwandb.login()(See below) - Alternatively you can set environment variables. There are several env variables which you can set to change the behavior of W&B logging. The most important are:
WANDB_API_KEY- create a new API key in your "Settings" section under your profile at wandb.ai/settingsWANDB_BASE_URL- this is the url of the W&B server
- Create a new API key in "Profile" -> "Settings" in the W&B App. Store your API key securely. It can only be viewed once when created.
Log Data and Model Checkpoints as Artifacts
W&B Artifacts allows you to track and version arbitrary serialized data (e.g. datasets, model checkpoints, evaluation results). When you create an artifact, you give it a name and a type, and that artifact is forever linked to the experimental system of record. If the underlying data changes, and you log that data asset again, W&B will automatically create new versions through checksummming its contents. W&B Artifacts can be thought of as a lightweight abstraction layer on top of shared unstructured file systems.
Anatomy of an artifact
The Artifact class will correspond to an entry in the W&B Artifact registry. The artifact has
- a name
- a type
- metadata
- description
- files, directory of files, or references
Example usage:
run = wandb.init(project = "my-project")
artifact = wandb.Artifact(name = "my_artifact", type = "data")
artifact.add_file("/path/to/my/file.txt")
run.log_artifact(artifact)
run.finish()
In this tutorial, the first thing we will do is download a training dataset and log it as an artifact to be used downstream in the training job.
Let's grab a version of our Dataset
We are going to generate a file containing the image
Using Artifact names and aliases to easily hand-off and abstract data assets
- By simply referring to the
name:aliascombination of a dataset or model, we can better standardize components of a workflow - For instance, you can build PyTorch
Dataset's orDataModule's which take as arguments W&B Artifact names and aliases to load appropriately
You can now see all the metadata associated with this dataset, the W&B runs consuming it, and the whole lineage of upstream and downstream artifacts!
Model Training
Writing the Model Class and Validation Function
Tracking the Training Loop
During training, it is a best practice to checkpoint your models overtime, so if training gets interrupted or your instance crashes you can resume from where you left off. With artifact logging, we can track all our checkpoints with W&B and attach any metadata we want (like format of serialization, class labels, etc.). That way, when someone needs to consume a checkpoint they know how to use it. When logging models of any form as artifacts, ensure to set the type of the artifact to model.
Manage all your model checkpoints for a project under one roof.
Model Registry
After logging a bunch of checkpoints across multiple runs during experimentation, now comes time to hand-off the best checkpoint to the next stage of the workflow (e.g. testing, deployment).
The Model Registry is a central page that lives above individual W&B projects. It houses Registered Models, portfolios that store "links" to the valuable checkpoints living in individual W&B Projects.
The model registry offers a centralized place to house the best checkpoints for all your model tasks. Any model artifact you log can be "linked" to a Registered Model.
Creating Registered Models and Linking through the UI
1. Access your team's model registry by going the team page and selecting Model Registry
2. Create a new Registered Model.
3. Go to the artifacts tab of the project that holds all your model checkpoints
4. Click "Link to Registry" for the model artifact version you want.
Creating Registered Models and Linking through the API
You can link a model via api with wandb.run.link_artifact passing in the artifact object, and the name of the Registered Model, along with aliases you want to append to it. Registered Models are entity (team) scoped in W&B so only members of a team can see and access the Registered Models there. You indicate a registered model name via api with <entity>/model-registry/<registered-model-name>. If a Registered Model doesn't exist, one will be created automatically.
What is "Linking"?
When you link to the registry, this creates a new version of that Registered Model, which is just a pointer to the artifact version living in that project. There's a reason W&B segregates the versioning of artifacts in a project from the versioning of a Registered Model. The process of linking a model artifact version is equivalent to "bookmarking" that artifact version under a Registered Model task.
Typically during R&D/experimentation, researchers generate 100s, if not 1000s of model checkpoint artifacts, but only one or two of them actually "see the light of day." This process of linking those checkpoints to a separate, versioned registry helps delineate the model development side from the model deployment/consumption side of the workflow. The globally understood version/alias of a model should be unpolluted from all the experimental versions being generated in R&D and thus the versioning of a Registered Model increments according to new "bookmarked" models as opposed to model checkpoint logging.
Create a Centralized Hub for all your models
- Add a model card, tags, slack notifactions to your Registered Model
- Change aliases to reflect when models move through different phases
- Embed the model registry in reports for model documentation and regression reports. See this report as an example
Set up Slack Notifications when new models get linked to the registry
Consuming a Registered Model
You now can consume any registered model via API by referring the corresponding name:alias. Model consumers, whether they are engineers, researchers, or CI/CD processes, can go to the model registry as the central hub for all models that should "see the light of day": those that need to go through testing or move to production.