Aml Pipelines With Automated Machine Learning Step
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
Azure Machine Learning Pipeline with AutoMLStep
This notebook demonstrates the use of AutoMLStep in Azure Machine Learning Pipeline.
Introduction
In this example we showcase how you can use AzureML Dataset to load data for AutoML via AML Pipeline.
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the configuration before running this notebook, please also take a look at the Automated ML setup-using-a-local-conda-environment section to setup the environment.
In this notebook you will learn how to:
- Create an
Experimentin an existingWorkspace. - Create or Attach existing AmlCompute to a workspace.
- Define data loading in a
TabularDataset. - Configure AutoML using
AutoMLConfig. - Use AutoMLStep
- Train the model using AmlCompute
- Explore the results.
- Test the best fitted model.
Azure Machine Learning and Pipeline SDK-specific imports
Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json
Create an Azure ML experiment
Let's create an experiment named "automlstep-sample" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.
The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.
Create or Attach an AmlCompute cluster
You will need to create a compute target for your AutoML run. In this tutorial, you get the default AmlCompute as your training compute resource.
Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.
Data
Review the Dataset Result
You can peek the result of a TabularDataset at any range using skip(i) and take(j).to_pandas_dataframe(). Doing so evaluates only j records for all the steps in the TabularDataset, which makes it fast even against large datasets.
TabularDataset objects are composed of a list of transformation steps (optional).
Train
This creates a general AutoML settings object.
Create Pipeline and AutoMLStep
You can define outputs for the AutoMLStep using TrainingOutput.
Create an AutoMLStep.
Examine Results
Retrieve the metrics of all child runs
Outputs of above run can be used as inputs of other steps in pipeline. In this tutorial, we will examine the outputs by retrieve output data and running some tests.
Retrieve the Best Model
Test the Model
Load Test Data
For the test data, it should have the same preparation step as the train data. Otherwise it might get failed at the preprocessing step.
Testing Our Best Fitted Model
We will use confusion matrix to see how our model works.