Notebooks
A
Azure
Pipeline For Image Classification

Pipeline For Image Classification

how-to-use-azuremlazure-mldata-sciencenotebookwork-with-datamachine-learningpipeline-with-datasetsazure-machine-learningdeep-learningazuremlazure-ml-notebooksazuredatasets-tutorial

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License [2017] Zalando SE, https://tech.zalando.com

Impressions

Build a simple ML pipeline for image classification

Introduction

This tutorial shows how to train a simple deep neural network using the Fashion MNIST dataset and Keras on Azure Machine Learning. Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Learn how to:

  • Set up your development environment
  • Create the Fashion MNIST dataset
  • Create a machine learning pipeline to train a simple deep learning neural network on a remote cluster
  • Retrieve input datasets from the experiment and register the output model with datasets

Prerequisite:

  • Understand the architecture and terms introduced by Azure Machine Learning
  • If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the configuration notebook to:
    • install the latest version of AzureML SDK
    • create a workspace and its configuration file (config.json)

Set up your development environment

All the setup for your development work can be accomplished in a Python notebook. Setup includes:

  • Importing Python packages
  • Connecting to a workspace to enable communication between your local computer and remote resources
  • Creating an experiment to track all your runs
  • Creating a remote compute target to use for training

Import packages

Import Python packages you need in this session. Also display the Azure Machine Learning SDK version.

[ ]

Connect to workspace

Create a workspace object from the existing workspace. Workspace.from_config() reads the file config.json and loads the details into an object named workspace.

[ ]

Create experiment and a directory

Create an experiment to track the runs in your workspace and a directory to deliver the necessary code from your computer to the remote resource.

[ ]

Create or Attach existing compute resource

By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU support. In this tutorial, you create Azure Machine Learning Compute as your training environment. The code below creates the compute clusters for you if they don't already exist in your workspace.

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

Creation of compute takes approximately 5 minutes. If the AmlCompute with that name is already in your workspace the code will skip the creation process.

[ ]

Create the Fashion MNIST dataset

By creating a dataset, you create a reference to the data source location. If you applied any subsetting transformations to the dataset, they will be stored in the dataset as well. The data remains in its existing location, so no extra storage cost is incurred.

[ ]

Build 2-step ML pipeline

The Azure Machine Learning Pipeline enables data scientists to create and manage multiple simple and complex workflows concurrently. A typical pipeline would have multiple tasks to prepare data, train, deploy and evaluate models. Individual steps in the pipeline can make use of diverse compute options (for example: CPU for data preparation and GPU for training) and languages. Learn More

Step 1: data preparation

In step one, we will load the image and labels from Fashion MNIST dataset into mnist_train.csv and mnist_test.csv

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. Both mnist_train.csv and mnist_test.csv contain 785 columns. The first column consists of the class labels, which represent the article of clothing. The rest of the columns contain the pixel-values of the associated image.

Intermediate data (or output of a step) is represented by a OutputFileDatasetConfig object. preprared_fashion_ds is produced as the output of step 1, and used as the input of step 2. OutputFileDatasetConfig introduces a data dependency between steps, and creates an implicit execution order in the pipeline. You can register a OutputFileDatasetConfig as a dataset and version the output data automatically.

[ ]
[ ]

A PythonScriptStep is a basic, built-in step to run a Python Script on a compute target. It takes a script name and optionally other parameters like arguments for the script, compute target, inputs and outputs. If no compute target is specified, default compute target for the workspace is used. You can also use a RunConfiguration to specify requirements for the PythonScriptStep, such as conda dependencies and docker image.

[ ]

Step 2: train CNN with Keras

Next, construct a ScriptRunConfig to configure the training run that trains a CNN model using Keras. It takes a dataset as the input.

[ ]
[ ]
[ ]

Pass the run configuration details into the PythonScriptStep.

[ ]

Build the pipeline

Once we have the steps (or steps collection), we can build the pipeline.

A pipeline is created with a list of steps and a workspace. Submit a pipeline using submit. When submit is called, a PipelineRun is created which in turn creates StepRun objects for each step in the workflow.

[ ]

Monitor the PipelineRun

[ ]
[ ]

Register the input dataset and the output model

Azure Machine Learning dataset makes it easy to trace how your data is used in ML. Learn More
For each Machine Learning experiment, you can easily trace the datasets used as the input through Run object.

[ ]

Register the input Fashion MNIST dataset with the workspace so that you can reuse it in other experiments or share it with your colleagues who have access to your workspace.

[ ]

Register the output model with dataset

[ ]