Notebooks
W
Weights and Biases
Text Classification

Text Classification

text_classificationwandb-examplesexamplessagemaker
Weights & Biases

Open In Colab

Text Classification with Sagemaker & Weights & Biases

This notebook will demonstrate how to:

  • log the datasets to W&B Tables for EDA
  • train on the banking77 dataset
  • log experiment results to Weights & Biases
  • log the validation predictions to W&B Tables for model evaluation
  • save the raw dataset, processed dtaset and model weights to W&B Artifacts

Note, this notebook should be run in a SageMaker notebook instance

Sagemaker

Weights & Biases

SageMaker is a comprehensive machine learning service. It is a tool that helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models by providing a rich set of orchestration tools and features.

Credit

This notebook is based on the Hugging Face & AWS SageMaker examples that can be found here

Setup

[ ]
[ ]

Weights & Biases Setup for AWS SageMaker

[ ]
[ ]

The only additional piece of setup needed to use W&B with SageMaker is to make your W&B API key available to SageMaker. In this case we save it to a file in the same directory as our training script. This will be named secrets.env and W&B will then use this to authenticate on each of the instances that SageMaker spins up.

[ ]

Log Dataset for Exporatory Analysis in W&B Tables

Here we log the train and eval datasets to separtate W&B Tables. After this is run, we can explore these tables in the W&B UI.

[ ]

Training with SageMaker and W&B

SageMaker Role

First we need to get our SageMaker role permissions. If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find here more about it.

[ ]

Creating an Estimator and start a training job

Here we will use the HuggingFace estimator from SageMaker, which includes an image of the main libraries necessary when training Hugging Face models

[ ]
[ ]
[ ]

HyperParameter Tuning with SageMaker and Weights & Biases

We can alsp use SageMaker's HyperparameterTuner to run hyperparameter search and log the results to Weights & Biases

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]

Dataset Versioning with W&B Artifacts

Weights and Biases Artifacts enable you to log end-to-end training pipelines to ensure your experiments are always reproducible.

Data privacy is critical to Weights & Biases and so we support the creation of Artifacts from reference locations such as your own private cloud such as AWS S3 or Google Cloud Storage. Local, on-premises of W&B are also available upon request.

By default, W&B stores artifact files in a private Google Cloud Storage bucket located in the United States. All files are encrypted at rest and in transit. For sensitive files, we recommend a private W&B installation or the use of reference artifacts.

Artifacts - Log Raw Dataset

[ ]

Log to W&B Artifacts

[ ]
[ ]

Artifacts - Log Train/Eval Split

[ ]

Log to W&B Artifacts

[ ]
[ ]

Artifacts - Dataset Preprocessing: Tokenization

[ ]
[ ]
[ ]
[ ]