Quickstart Azureml Automl
![]()
Quickstart: Fraud Classification using Automated ML
In this quickstart, you use automated machine learning in Azure Machine Learning service to train a classification model on an associated fraud credit card dataset. This process accepts training data and configuration settings, and automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model.
You will learn how to:
- Download a dataset and look at the data
- Train a machine learning classification model using autoML
- Explore the results
Connect to your workspace and create an experiment
You start with importing some libraries and creating an experiment to track the runs in your workspace. A workspace can have multiple experiments, and all the users that have access to the workspace can collaborate on them.
Load Data
Load the credit card dataset from a csv file containing both training features and labels. The features are inputs to the model, while the training labels represent the expected output of the model. Next, we'll split the data using random_split and extract the training data for the model.
Follow this how-to if you want to learn more about Datasets and how to use them.
Train
When you use automated machine learning in Azure ML, you input training data and configuration settings, and the process automatically iterates through combinations of different feature normalization/standardization methods, models, and hyperparameter settings to arrive at the best model. Learn more about how you configure automated ML here.
Instantiate an AutoMLConfig object. This defines the settings and data used to run the experiment.
| Property | Description |
|---|---|
| task | classification or regression |
| primary_metric | This is the metric that you want to optimize. |
| enable_early_stopping | Stop the run if the metric score is not showing improvement. |
| n_cross_validations | Number of cross validation splits. |
| training_data | Input dataset, containing both features and label column. |
| label_column_name | The name of the label column. |
You can find more information about primary metrics here
Call the submit method on the experiment object and pass the run configuration.
Note: Depending on the data and the number of iterations an AutoML run can take a while to complete.
In this example, we specify show_output = True to print currently running iterations to the console. It is also possible to navigate to the experiment through the Experiment activity tab in the left menu, and monitor the run status from there.
Analyze results
Below we select the best model from our iterations. The get_output method on automl_classifier returns the best run and the model for the run.
Tests
Now that the model is trained, split the data in the same way the data was split for training (The difference here is the data is being split locally) and then run the test data through the trained model to get the predicted values.
Calculate metrics for the prediction
Now visualize the data to show what our truth (actual) values are compared to the predicted values from the trained model that was returned.
Control cost and further exploration
If you want to control cost you can stop the compute instance this notebook is running on by clicking the "Stop compute" button next to the status dropdown in the menu above.
If you want to run more notebook samples, you can click on Sample Notebooks next to the Files view and explore the notebooks made available for you there.