Aml Pipelines Setup Schedule For A Published Pipeline
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
How to Setup a Schedule for a Published Pipeline or Pipeline Endpoint
In this notebook, we will show you how you can run an already published pipeline or a pipeline endpoint on a schedule.
Prerequisites and AML Basics
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook first if you haven't. This sets you up with a working config file that has information on your workspace, subscription id, etc.
Initialization Steps
Compute Targets
Retrieve an already attached Azure Machine Learning Compute
Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.
Build and Publish Pipeline
Build a simple pipeline, publish it and add a schedule to run it.
Define a pipeline step
Define a single step pipeline for demonstration purpose. The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.
Build the pipeline
Publish the pipeline
Create a Pipeline Endpoint
Alternatively, you can create a schedule to run a pipeline endpoint instead of a published pipeline. You will need this to create a schedule against a pipeline endpoint in the last section of this notebook.
Schedule Operations
Schedule operations require id of a published pipeline. You can get all published pipelines and do Schedule operations on them, or if you already know the id of the published pipeline, you can use it directly as well.
Get published pipeline ID
Create a schedule for the published pipeline using a recurrence
This schedule will run on a specified recurrence interval.
Note: Set the wait_for_provisioning flag to False if you do not want to wait for the call to provision the schedule in the backend.
Get all schedules for a given pipeline
Once you have the published pipeline ID, then you can get all schedules for that pipeline.
Get all schedules in your workspace
You can also iterate through all schedules in your workspace if needed.
Get the schedule
Disable the schedule
It is important to note the best practice of disabling schedules when not in use. The number of schedule triggers allowed per month per region per subscription is 100,000. This is calculated using the project trigger counts for all active schedules.
Reenable the schedule
Change recurrence of the schedule
Create a schedule for the pipeline using a Datastore
This schedule will run when additions or modifications are made to Blobs in the Datastore. By default, the Datastore container is monitored for changes. Use the path_on_datastore parameter to instead specify a path on the Datastore to monitor for changes. Note: the path_on_datastore will be under the container for the datastore, so the actual path monitored will be container/path_on_datastore. Changes made to subfolders in the container/path will not trigger the schedule. Note: Only Blob Datastores are supported. Note: Not supported for CMK workspaces. Please review these instructions in order to setup a blob trigger submission schedule with CMK enabled. Also see those instructions to bring your own LogicApp to avoid the schedule triggers per month limit.
Create a schedule for a pipeline endpoint
Alternative to creating schedules for a published pipeline, you can also create schedules to run pipeline endpoints. Retrieve the pipeline endpoint id to create a schedule.
Get all schedules for a given pipeline endpoint
Once you have the pipeline endpoint ID, then you can get all schedules for that pipeline endopint.
Disable the schedule created for running the pipeline endpont
Recall the best practice of disabling schedules when not in use. The number of schedule triggers allowed per month per region per subscription is 100,000. This is calculated using the project trigger counts for all active schedules.