Aml Pipelines How To Use Azurebatch To Run A Windows Executable
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
Azure Machine Learning Pipeline with AzureBatchStep
This notebook is used to demonstrate the use of AzureBatchStep in Azure Machine Learning Pipeline. An AzureBatchStep will submit a job to an AzureBatch Compute to run a simple windows executable.
Azure Machine Learning and Pipeline SDK-specific Imports
Initialize Workspace
Initialize a workspace object from persisted configuration. Make sure the config file is present at .\config.json
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, If you don't have a config.json file, please go through the configuration Notebook located here.
This sets you up with a working config file that has information on your workspace, subscription id, etc.
Attach Batch Compute to Workspace
To submit jobs to Azure Batch service, you must attach your Azure Batch account to the workspace.
Setup Datastore
Setting up the Blob storage associated with the workspace.
The following call retrieves the Azure Blob Store associated with your workspace.
Note that workspaceblobstore is the name of this store and CANNOT BE CHANGED and must be used as is.
If you want to register another Datastore, please follow the instructions from here: https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-access-data#register-a-datastore
Setup Input and Output
For this example we will upload a file in the provided Datastore. These are some helper methods to achieve that.
Here we associate the input DataReference with an existing file in the provided Datastore. Feel free to upload the file of your choice manually or use the upload_file_to_datastore method.
Setup AzureBatch Job Binaries
AzureBatch can run a task within the job and here we put a simple .cmd file to be executed. Feel free to put any binaries in the folder, or modify the .cmd file as needed, they will be uploaded once we create the AzureBatch Step.
Create an AzureBatchStep
AzureBatchStep is used to submit a job to the attached Azure Batch compute.
- name: Name of the step
- pool_id: Name of the pool, it can be an existing pool, or one that will be created when the job is submitted
- inputs: List of inputs that will be processed by the job
- outputs: List of outputs the job will create
- executable: The executable that will run as part of the job
- arguments: Arguments for the executable. They can be plain string format, inputs, outputs or parameters
- compute_target: The compute target where the job will run.
- source_directory: The local directory with binaries to be executed by the job
Optional parameters:
- create_pool: Boolean flag to indicate whether create the pool before running the jobs
- delete_batch_job_after_finish: Boolean flag to indicate whether to delete the job from Batch account after it's finished
- delete_batch_pool_after_finish: Boolean flag to indicate whether to delete the pool after the job finishes
- is_positive_exit_code_failure: Boolean flag to indicate if the job fails if the task exists with a positive code
- vm_image_urn: If create_pool is true and VM uses VirtualMachineConfiguration.
Value format: 'urn:publisher:offer:sku'.
Example: urn:MicrosoftWindowsServer:WindowsServer:2012-R2-Datacenter
For more details:
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/cli-ps-findimage#table-of-commonly-used-windows-images and
https://docs.microsoft.com/en-us/azure/virtual-machines/linux/cli-ps-findimage#find-specific-images - run_task_as_admin: Boolean flag to indicate if the task should run with Admin privileges
- target_compute_nodes: Assumes create_pool is true, indicates how many compute nodes will be added to the pool
- source_directory: Local folder that contains the module binaries, executable, assemblies etc.
- executable: Name of the command/executable that will be executed as part of the job
- arguments: Arguments for the command/executable
- inputs: List of input port bindings
- outputs: List of output port bindings
- vm_size: If create_pool is true, indicating Virtual machine size of the compute nodes
- compute_target: BatchCompute compute
- allow_reuse: Whether the module should reuse previous results when run with the same settings/inputs
- version: A version tag to denote a change in functionality for the module