Notebooks
A
Azure
Aml Pipelines With Automated Machine Learning Step

Aml Pipelines With Automated Machine Learning Step

how-to-use-azuremlazure-mldata-sciencenotebookintro-to-pipelinesmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksmachine-learning-pipelinesazure

Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.

Impressions

Azure Machine Learning Pipeline with AutoMLStep

This notebook demonstrates the use of AutoMLStep in Azure Machine Learning Pipeline.

Introduction

In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline.

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the configuration before running this notebook, please also take a look at the Automated ML setup-using-a-local-conda-environment section to setup the environment.

In this notebook you will learn how to:

  1. Create an Experiment in an existing Workspace.
  2. Create or Attach existing AmlCompute to a workspace.
  3. Define data loading in a TabularDataset.
  4. Configure AutoML using AutoMLConfig.
  5. Use AutoMLStep
  6. Train the model using AmlCompute
  7. Explore the results.
  8. Test the best fitted model.

Azure Machine Learning and Pipeline SDK-specific imports

[ ]

Initialize Workspace

Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json

[ ]

Create an Azure ML experiment

Let's create an experiment named "automlstep-sample" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.

[ ]

Create or Attach an AmlCompute cluster

You will need to create a compute target for your AutoML run. In this tutorial, you get the default AmlCompute as your training compute resource.

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

[ ]

Data

[ ]

Review the Dataset Result

You can peek the result of a TabularDataset at any range using skip(i) and take(j).to_pandas_dataframe(). Doing so evaluates only j records for all the steps in the TabularDataset, which makes it fast even against large datasets.

TabularDataset objects are composed of a list of transformation steps (optional).

[ ]

Train

This creates a general AutoML settings object.

[ ]

Create Pipeline and AutoMLStep

You can define outputs for the AutoMLStep using TrainingOutput.

[ ]

Create an AutoMLStep.

[ ]
[ ]
[ ]
[ ]
[ ]

Examine Results

Retrieve the metrics of all child runs

Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will examine the outputs by retrieve output data and running some tests.

[ ]
[ ]

Retrieve the Best Model

[ ]
[ ]
[ ]

Test the Model

Load Test Data

For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step.

[ ]

Testing Our Best Fitted Model

We will use confusion matrix to see how our model works.

[ ]
[ ]