Notebooks
A
Azure
Auto Ml Regression Model Proxy

Auto Ml Regression Model Proxy

how-to-use-azuremlazure-mldata-scienceexperimentalnotebookmachine-learningazure-machine-learningautomated-machine-learningdeep-learningazuremlazure-ml-notebooksazureregression-model-proxy

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

Impressions

Automated Machine Learning

Regression with Aml Compute

Contents

  1. Introduction
  2. Setup
  3. Data
  4. Train
  5. Results
  6. Test

Introduction

In this example we use an experimental feature, Model Proxy, to do a predict on the best generated model without downloading the model locally. The prediction will happen on same compute and environment that was used to train the model. This feature is currently in the experimental state, which means that the API is prone to changing, please make sure to run on the latest version of this notebook if you face any issues. This notebook will also leverage MLFlow for saving models, allowing for more portability of the resulting models. See https://docs.microsoft.com/en-us/azure/machine-learning/how-to-use-mlflow for more details around MLFlow is AzureML.

If you are using an Azure Machine Learning Compute Instance, you are all set. Otherwise, go through the configuration notebook first if you haven't already to establish your connection to the AzureML Workspace.

In this notebook you will learn how to:

  1. Create an Experiment in an existing Workspace.
  2. Configure AutoML using AutoMLConfig.
  3. Train the model using remote compute.
  4. Explore the results.
  5. Test the best fitted model.

Setup

As part of the setup you have already created an Azure ML Workspace object. For Automated ML you will need to create an Experiment object, which is a named object in a Workspace used to run experiments.

[ ]

This sample notebook may use features that are not available in previous versions of the Azure ML SDK.

[ ]
[ ]

Using AmlCompute

You will need to create a compute target for your AutoML run. In this tutorial, you use AmlCompute as your training compute resource.

[ ]

Data

Load Data

Load the hardware dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.

[ ]

The split data will be used in the remote compute by ModelProxy and locally to compare results. So, we need to persist the split data to avoid descrepencies from different package versions in the local and remote.

[ ]

Train

Instantiate an AutoMLConfig object to specify the settings and data used to run the experiment.

PropertyDescription
taskclassification, regression or forecasting
primary_metricThis is the metric that you want to optimize. Regression supports the following primary metrics:
spearman_correlation
normalized_root_mean_squared_error
r2_score
normalized_mean_absolute_error
n_cross_validationsNumber of cross validation splits.
training_data(sparse) array-like, shape = [n_samples, n_features]
label_column_name(sparse) array-like, shape = [n_samples, ], targets values.

You can find more information about primary metrics here

[ ]

Call the submit method on the experiment object and pass the run configuration. Execution of remote runs is asynchronous. Depending on the data and the number of iterations this can run for a while. Validation errors and current status will be shown when setting show_output=True and the execution will be synchronous.

[ ]
[ ]
[ ]

Results

[ ]

Retrieve the Best Child Run

Below we select the best pipeline from our iterations. The get_best_child method returns the best run. Overloads on get_best_child allow you to retrieve the best run for any logged metric.

[ ]

Show hyperparameters

Show the model pipeline used for the best run with its hyperparameters. For ensemble pipelines it shows the iterations and algorithms that are ensembled.

[ ]

Best Child Run Based on Any Other Metric

Show the run and the model that has the smallest root_mean_squared_error value (which turned out to be the same as the one with largest spearman_correlation value):

[ ]
[ ]

Creating ModelProxy for submitting prediction runs to the training environment.

We will create a ModelProxy for the best child run, which will allow us to submit a run that does the prediction in the training environment. Unlike the local client, which can have different versions of some libraries, the training environment will have all the compatible libraries for the model already.

[ ]

Exploring results

[ ]
[ ]