Aml Pipelines With Notebook Runner Step
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
Azure Machine Learning Pipeline with NotebookRunnerStep
This notebook demonstrates the use of NotebookRunnerStep. It allows you to run a local notebook as a step in Azure Machine Learning Pipeline.
Introduction
In this example we showcase how you can run another notebook notebook_runner/training_notebook.ipynb as a step in Azure Machine Learning Pipeline.
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you have executed the configuration before running this notebook.
In this notebook you will learn how to:
- Create an
Experimentin an existingWorkspace. - Create or Attach existing AmlCompute to a workspace.
- Configure NotebookRun using
NotebokRunConfig. - Use NotebookRunnerStep.
- Run the notebook on
AmlComputeas a pipeline step consuming the output of a python script step.
Advantages of running your notebook as a step in pipeline:
- Run your notebook like a python script without converting into .py files, leveraging complete end to end experience of Azure Machine Learning Pipelines.
- Use pipeline intermediate data to and from the notebook along with other steps in pipeline.
- Parameterize your notebook with Pipeline Parameters.
Azure Machine Learning and Pipeline SDK-specific imports
Initialize Workspace
Initialize a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class%29) object from persisted configuration.
Upload data to datastore
Create an Azure ML experiment
Let's create an experiment named "notebook-step-run-example" and a folder to holding the notebook and other scripts. The script runs will be recorded under the experiment in Azure.
The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.
Create or Attach an AmlCompute cluster
You will need to create a compute target for your remote run. In this tutorial, you get the default AmlCompute as your training compute resource.
Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.
Create a new RunConfig object
Define input and outputs
Create notebook run configuration and set parameters values
Define PythonScriptStep
Define NotebookRunnerStep
This step will consume intermediate output produced by python_script_step as an input.
Optionally, a output of type output_notebook_pipeline_data_name can be added to the NotebookRunnerStep to redirect the output_notebook of notebook run to NotebookRunnerStep's step output produced as PipelineData and can be further passed along the pipeline.
Build Pipeline
Once we have the steps (or steps collection), we can build the pipeline. By deafult, all these steps will run in parallel once we submit the pipeline for run.
A pipeline is created with a list of steps and a workspace. Submit a pipeline using submit. When submit is called, a PipelineRun is created which in turn creates StepRun objects for each step in the workflow.
Download output notebook
output_notebook can be retrieved via pipeline step output if output_notebook_pipeline_data_name is provided to the NotebookRunnerStep