Notebooks
A
Azure
Aml Pipelines Showcasing Dataset And Pipelineparameter

Aml Pipelines Showcasing Dataset And Pipelineparameter

how-to-use-azuremlazure-mldata-sciencenotebookintro-to-pipelinesmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksmachine-learning-pipelinesazure

Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.

Impressions

Showcasing Dataset and PipelineParameter

This notebook demonstrates how a FileDataset or TabularDataset can be parametrized with PipelineParameters in an AML Pipeline. By parametrizing datasets, you can dynamically run pipeline experiments with different datasets without any code change.

A common use case is building a training pipeline with a sample of your training data for quick iterative development. When you're ready to test and deploy your pipeline at scale, you can pass in your full training dataset to the pipeline experiment without making any changes to your training script.

To see more about how parameters work between steps, please refer aml-pipelines-with-data-dependency-steps.

Azure Machine Learning and Pipeline SDK-specific imports

[ ]

Initialize Workspace

Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure the config file is present at .\config.json

If you don't have a config.json file, go through the configuration Notebook first.

This sets you up with a working config file that has information on your workspace, subscription id, etc.

[ ]

Create an Azure ML experiment

Let's create an experiment named "showcasing-dataset" and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

[ ]

Create or Attach an AmlCompute cluster

You will need to create a compute target for your AutoML run. In this tutorial, you get the default AmlCompute as your training compute resource.

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

[ ]

Dataset Configuration

The following steps detail how to create a FileDataset and TabularDataset from an external CSV file, and configure them to be used by a Pipeline:

  1. Create a dataset from a csv file
  2. Create a PipelineParameter object and set the default_value to the dataset. PipelineParameter objects enabled arguments to be passed into Pipelines when they are resubmitted after creation. The name is referenced later on when we submit additional pipeline runs with different input datasets.
  3. Create a DatasetConsumptionConfig object from the PiepelineParameter. The DatasetConsumptionConfig object specifies how the dataset should be used by the remote compute where the pipeline is run. NOTE only DatasetConsumptionConfig objects built on FileDataset can be set as_mount() or as_download() on the remote compute.
[ ]

We will setup a training script to ingest our passed-in datasets and print their contents. NOTE the names of the datasets referenced inside the training script correspond to the name of their respective DatasetConsumptionConfig objects we defined above.

[ ]

Create a Pipeline with a Dataset PipelineParameter

Note that the file_ds_consumption and tabular_ds_consumption are specified as both arguments and inputs to create a step.

[ ]
[ ]

Submit a Pipeline with a Dataset PipelineParameter

Pipelines can be submitted with default values of PipelineParameters by not specifying any parameters.

[ ]
[ ]
[ ]

Submit a Pipeline with a different Dataset PipelineParameter value from the SDK

The training pipeline can be reused with different input datasets by passing them in as PipelineParameters

[ ]
[ ]
[ ]
[ ]

Dynamically Set the Dataset PipelineParameter Values using a REST Call

Let's publish the pipeline we created previously, so we can generate a pipeline endpoint. We can then submit the iris datasets to the pipeline REST endpoint by passing in their IDs.

[ ]
[ ]
[ ]
[ ]
[ ]
[ ]
[ ]