Aml Pipelines Use Kusto As Compute Target
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
Azure Machine Learning Pipeline with KustoStep
To use Kusto as a compute target from Azure Machine Learning Pipeline, a KustoStep is used. A KustoStep enables the functionality of running Kusto queries on a target Kusto cluster in Azure ML Pipelines. Each KustoStep can target one Kusto cluster and perform multiple queries on them. This notebook demonstrates the use of KustoStep in Azure Machine Learning (AML) Pipeline.
Before you begin:
- Have an Azure Machine Learning workspace: You will need details of this workspace later on to define KustoStep.
- Have a Service Principal: You will need a service principal and use its credentials to access your cluster. See this for more information.
- Have a Blob storage: You will need a Azure Blob storage for uploading the output of your Kusto query.
Azure Machine Learning and Pipeline SDK-specific imports
Initialize Workspace
Initialize a workspace object from persisted configuration. If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, make sure you go through the configuration Notebook first if you haven't.
Attach Kusto compute target
Next, you need to create a Kusto compute target and give it a name. You will use this name to refer to your Kusto compute target inside Azure Machine Learning. Your workspace will be associated to this Kusto compute target. You will also need to provide some credentials that will be used to enable access to your target Kusto cluster and database.
- Resource Group - The resource group name of your Azure Machine Learning workspace
- Workspace Name - The workspace name of your Azure Machine Learning workspace
- Resource ID - The resource ID of your Kusto cluster
- Tenant ID - The tenant ID associated to your Kusto cluster
- Application ID - The Application ID associated to your Kusto cluster
- Application Key - The Application key associated to your Kusto cluster
- Kusto Connection String - The connection string of your Kusto cluster
Setup output
To use Kusto as a compute target for Azure Machine Learning Pipeline, a KustoStep is used. Currently KustoStep only supports uploading results to Azure Blob store. Let's define an output datastore via PipelineData to be used in KustoStep.
Add a KustoStep to Pipeline
Adds a Kusto query as a step in a Pipeline.
- name: Name of the Module
- compute_target: Name of Kusto compute target
- database_name: Name of the database to perform Kusto query on
- query_directory: Path to folder that contains only a text file with Kusto queries (see here for more details on Kusto queries).
- If the query is parameterized, then the text file must also include any declaration of query parameters (see here for more details on query parameters declaration statements).
- An example of the query text file could just contain the query "StormEvents | count | as HowManyRecords;", where StormEvents is the table name.
- Note. the text file should just contain the declarations and queries without quotation marks around them.
- outputs: Output binding to an Azure Blob Store.
- parameter_dict (optional): Dictionary that contains the values of parameters declared in the query text file in the query_directory mentioned above.
- Dictionary key is the parameter name, and dictionary value is the parameter value.
- For example, parameter_dict = {"paramName1": "paramValue1", "paramName2": "paramValue2"}
- allow_reuse (optional): Whether the step should reuse previous results when run with the same settings/inputs (default to False)