Text Classification
Text Classification with Sagemaker & Weights & Biases
This notebook will demonstrate how to:
- log the datasets to W&B Tables for EDA
- train on the
banking77dataset - log experiment results to Weights & Biases
- log the validation predictions to W&B Tables for model evaluation
- save the raw dataset, processed dtaset and model weights to W&B Artifacts
Note, this notebook should be run in a SageMaker notebook instance
Sagemaker

SageMaker is a comprehensive machine learning service. It is a tool that helps data scientists and developers to prepare, build, train, and deploy high-quality machine learning (ML) models by providing a rich set of orchestration tools and features.
Credit
This notebook is based on the Hugging Face & AWS SageMaker examples that can be found here
Setup
Weights & Biases Setup for AWS SageMaker
The only additional piece of setup needed to use W&B with SageMaker is to make your W&B API key available to SageMaker. In this case we save it to a file in the same directory as our training script. This will be named secrets.env and W&B will then use this to authenticate on each of the instances that SageMaker spins up.
Log Dataset for Exporatory Analysis in W&B Tables
Here we log the train and eval datasets to separtate W&B Tables. After this is run, we can explore these tables in the W&B UI.
Training with SageMaker and W&B
SageMaker Role
First we need to get our SageMaker role permissions. If you are going to use Sagemaker in a local environment. You need access to an IAM Role with the required permissions for Sagemaker. You can find here more about it.
Creating an Estimator and start a training job
Here we will use the HuggingFace estimator from SageMaker, which includes an image of the main libraries necessary when training Hugging Face models
HyperParameter Tuning with SageMaker and Weights & Biases
We can alsp use SageMaker's HyperparameterTuner to run hyperparameter search and log the results to Weights & Biases
Dataset Versioning with W&B Artifacts
Weights and Biases Artifacts enable you to log end-to-end training pipelines to ensure your experiments are always reproducible.
Data privacy is critical to Weights & Biases and so we support the creation of Artifacts from reference locations such as your own private cloud such as AWS S3 or Google Cloud Storage. Local, on-premises of W&B are also available upon request.
By default, W&B stores artifact files in a private Google Cloud Storage bucket located in the United States. All files are encrypted at rest and in transit. For sensitive files, we recommend a private W&B installation or the use of reference artifacts.
Artifacts - Log Raw Dataset
Log to W&B Artifacts
Artifacts - Log Train/Eval Split
Log to W&B Artifacts