Notebooks
A
Azure
Aml Pipelines Setup Schedule For A Published Pipeline

Aml Pipelines Setup Schedule For A Published Pipeline

how-to-use-azuremlazure-mldata-sciencenotebookintro-to-pipelinesmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksmachine-learning-pipelinesazure

Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.

Impressions

How to Setup a Schedule for a Published Pipeline or Pipeline Endpoint

In this notebook, we will show you how you can run an already published pipeline or a pipeline endpoint on a schedule.

Prerequisites and AML Basics

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.

Initialization Steps

[ ]

Compute Targets

Retrieve an already attached Azure Machine Learning Compute

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

[ ]

Build and Publish Pipeline

Build a simple pipeline, publish it and add a schedule to run it.

Define a pipeline step

Define a single step pipeline for demonstration purpose. The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.

[ ]

Build the pipeline

[ ]

Publish the pipeline

[ ]

Create a Pipeline Endpoint

Alternatively, you can create a schedule to run a pipeline endpoint instead of a published pipeline. You will need this to create a schedule against a pipeline endpoint in the last section of this notebook.

[ ]

Schedule Operations

Schedule operations require id of a published pipeline. You can get all published pipelines and do Schedule operations on them, or if you already know the id of the published pipeline, you can use it directly as well.

Get published pipeline ID

[ ]

Create a schedule for the published pipeline using a recurrence

This schedule will run on a specified recurrence interval.

[ ]

Note: Set the wait_for_provisioning flag to False if you do not want to wait for the call to provision the schedule in the backend.

Get all schedules for a given pipeline

Once you have the published pipeline ID, then you can get all schedules for that pipeline.

[ ]

Get all schedules in your workspace

You can also iterate through all schedules in your workspace if needed.

[ ]

Get the schedule

[ ]

Disable the schedule

It is important to note the best practice of disabling schedules when not in use. The number of schedule triggers allowed per month per region per subscription is 100,000. This is calculated using the project trigger counts for all active schedules.

[ ]

Reenable the schedule

[ ]

Change recurrence of the schedule

[ ]

Create a schedule for the pipeline using a Datastore

This schedule will run when additions or modifications are made to Blobs in the Datastore. By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Note: the path_on_datastore will be under the container for the datastore, so the actual path monitored will be container/path_on_datastore. Changes made to subfolders in the container/path will not trigger the schedule. Note: Only Blob Datastores are supported. Note: Not supported for CMK workspaces. Please review these instructions in order to setup a blob trigger submission schedule with CMK enabled. Also see those instructions to bring your own LogicApp to avoid the schedule triggers per month limit.

[ ]
[ ]

Create a schedule for a pipeline endpoint

Alternative to creating schedules for a published pipeline, you can also create schedules to run pipeline endpoints. Retrieve the pipeline endpoint id to create a schedule.

[ ]

Get all schedules for a given pipeline endpoint

Once you have the pipeline endpoint ID, then you can get all schedules for that pipeline endopint.

[ ]

Disable the schedule created for running the pipeline endpont

Recall the best practice of disabling schedules when not in use. The number of schedule triggers allowed per month per region per subscription is 100,000. This is calculated using the project trigger counts for all active schedules.

[ ]