Notebooks
A
Azure
Quickstart Azureml Automl

Quickstart Azureml Automl

azure-mldata-sciencenotebookquickstart-azureml-automltutorialsmachine-learningcompute-instance-quickstartsazure-machine-learningdeep-learningazuremlazure-ml-notebooksazure

Impressions

Quickstart: Fraud Classification using Automated ML

In this quickstart, you use automated machine learning in Azure Machine Learning service to train a classification model on an associated fraud credit card dataset. This process accepts training data and configuration settings, and automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.

You will learn how to:

  • Download a dataset and look at the data
  • Train a machine learning classification model using autoML
  • Explore the results

Connect to your workspace and create an experiment

You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them.

[ ]
[ ]

Load Data

Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.

Follow this how-to if you want to learn more about Datasets and how to use them.

[ ]

Train

When you use automated machine learning in Azure ML, you input training data and configuration settings, and the process automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model. Learn more about how you configure automated ML here.

Instantiate an AutoMLConfig object. This defines the settings and data used to run the experiment.

PropertyDescription
taskclassification or regression
primary_metricThis is the metric that you want to optimize.
enable_early_stoppingStop the run if the metric score is not showing improvement.
n_cross_validationsNumber of cross validation splits.
training_dataInput dataset, containing both features and label column.
label_column_nameThe name of the label column.

You can find more information about primary metrics here

[ ]

Call the submit method on the experiment object and pass the run configuration.

Note: Depending on the data and the number of iterations an AutoML run can take a while to complete.

In this example, we specify show_output = True to print currently running iterations to the console. It is also possible to navigate to the experiment through the Experiment activity tab in the left menu, and monitor the run status from there.

[ ]
[ ]

Analyze results

Below we select the best model from our iterations. The get_output method on automl_classifier returns the best run and the model for the run.

[ ]

Tests

Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values.

[ ]
[ ]

Calculate metrics for the prediction

Now visualize the data to show what our truth (actual) values are compared to the predicted values from the trained model that was returned.

[ ]

Control cost and further exploration

If you want to control cost you can stop the compute instance this notebook is running on by clicking the "Stop compute" button next to the status dropdown in the menu above.

If you want to run more notebook samples, you can click on Sample Notebooks next to the Files view and explore the notebooks made available for you there.