Notebooks
A
Azure
Aml Pipelines Parameter Tuning With Hyperdrive

Aml Pipelines Parameter Tuning With Hyperdrive

how-to-use-azuremlazure-mldata-sciencenotebookintro-to-pipelinesmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksmachine-learning-pipelinesazure

Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.

Impressions

Azure Machine Learning Pipeline with HyperDriveStep

This notebook is used to demonstrate the use of HyperDriveStep in AML Pipeline.

Prerequisites and Azure Machine Learning Basics

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.

Azure Machine Learning and Pipeline SDK-specific imports

[ ]

Initialize workspace

Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\config.json

[ ]

Create an Azure ML experiment

Let's create an experiment named "tf-mnist" and a folder to hold the training scripts.

The best practice is to use separate folders for scripts and its dependent files for each step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.

The script runs will be recorded under the experiment in Azure.

[ ]

Download MNIST dataset

In order to train on the MNIST dataset we will first need to download it from Yan LeCun's web site directly and save them in a data folder locally.

[ ]

Show some sample images

Let's load the downloaded compressed file into numpy arrays using some utility functions included in the utils.py library file from the current folder. Then we use matplotlib to plot 30 random images from the dataset along with their labels.

[ ]

Upload MNIST dataset to blob datastore

A datastore is a place where data can be stored that is then made accessible to a Run either by means of mounting or copying the data to the compute target. In the next step, we will use Azure Blob Storage and upload the training and test set into the Azure Blob datastore, which we will then later be mount on a Batch AI cluster for training.

[ ]

Create Azure Machine Learning datasets

By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred.

[ ]

Retrieve or create a Azure Machine Learning compute

Azure Machine Learning Compute is a service for provisioning and managing clusters of Azure virtual machines for running machine learning workloads. Let's create a new Azure Machine Learning Compute in the current workspace, if it doesn't already exist. We will then run the training script on this compute target.

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

If we could not find the compute with the given name in the previous cell, then we will create a new compute here. This process is broken down into the following steps:

  1. Create the configuration
  2. Create the Azure Machine Learning compute

This process will take a few minutes and is providing only sparse output in the process. Please make sure to wait until the call returns before moving to the next cell.

[ ]

Copy the training files into the script folder

The TensorFlow training script is already created for you. You can simply copy it into the script folder, together with the utility library used to load compressed data file into numpy array.

[ ]

Retrieve an Environment

In this tutorial, we will use one of Azure ML's curated TensorFlow environments for training. Curated environments are available in your workspace by default. Specifically, we will use the TensorFlow 2.0 GPU curated environment.

[ ]

Setup an input for the ScriptRunConfig step

You can mount dataset to remote compute.

[ ]

Configure the training job

Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on

[ ]

Intelligent hyperparameter tuning

Now let's try hyperparameter tuning by launching multiple runs on the cluster. First let's define the parameter space using random sampling.

In this example we will use random sampling to try different configuration sets of hyperparameters to maximize our primary metric, the best validation accuracy (validation_acc).

[ ]

Now we will define an early termnination policy. The BanditPolicy basically states to check the job every 2 iterations. If the primary metric (defined later) falls outside of the top 10% range, Azure ML terminate the job. This saves us from continuing to explore hyperparameters that don't show promise of helping reach our target metric.

Refer here for more information on the BanditPolicy and other policies available.

[ ]

Now we are ready to configure a run configuration object, and specify the primary metric validation_acc that's recorded in your training runs. If you go back to visit the training script, you will notice that this value is being logged after every epoch (a full batch set). We also want to tell the service that we are looking to maximizing this value. We also set the number of samples to 20, and maximal concurrent job to 4, which is the same as the number of nodes in our computer cluster.

[ ]

HyperDriveStep

HyperDriveStep can be used to run HyperDrive job as a step in pipeline.

  • name: Name of the step
  • hyperdrive_config: A HyperDriveConfig that defines the configuration for this HyperDrive run
  • inputs: List of input port bindings
  • outputs: List of output port bindings
  • metrics_output: Optional value specifying the location to store HyperDrive run metrics as a JSON file
  • allow_reuse: whether to allow reuse
  • version: version
[ ]

Find and register best model

When all the jobs finish, we can choose to register the model that has the highest accuracy through an additional PythonScriptStep.

Through this additional register_model_step, we register the chosen files as a model named tf-dnn-mnist under the workspace for deployment.

[ ]

Run the pipeline

[ ]

Monitor using widget

[ ]

Wait for the completion of this Pipeline run

[ ]

Retrieve the metrics

Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will show the result metrics.

[ ]
[ ]

For model deployment, please refer to Training, hyperparameter tune, and deploy with TensorFlow.