Spark Job On Synapse Spark Pool
how-to-use-azuremlazure-mldata-sciencenotebookmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksazureazure-synapse
Export
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
Using Synapse Spark Pool as a Compute Target from Azure Machine Learning Remote Run
- To use Synapse Spark Pool as a compute target from Experiment Run, ScriptRunConfig is used, the same as other Experiment Runs. This notebook demonstrates how to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster.
- To use Synapse Spark Pool as a compute target from Azure Machine Learning Pipeline, a SynapseSparkStep is used. This notebook demonstrates how to leverage SynapseSparkStep in Azure Machine Learning Pipeline.
Before you begin:
- Create an Azure Synapse workspace, check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace) for more information.
- Create Spark Pool in Synapse workspace: check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-apache-spark-pool-portal) for more information.
Azure Machine Learning and Pipeline SDK-specific imports
[ ]
[ ]
Link Synapse workspace to AML
You have to be an "Owner" of Synapse workspace resource to perform linking. You can check your role in the Azure resource management portal, if you don't have an "Owner" role, you can contact an "Owner" to link the workspaces for you.
[ ]
Linked service property
A MSI (system_assigned_identity_principal_id) will be generated for each linked service, for example:
name=synapselink,
type=Synapse, linked_service_resource_id=/subscriptions/4faaaf21-663f-4391-96fd-47197c630979/resourceGroups/static_resources_synapse_test/providers/Microsoft.Synapse/workspaces/synapsetest2, system_assigned_identity_principal_id=eb355d52-3806-4c5a-aec9-91447e8cfc2eMake sure you grant "Synapse Apache Spark Administrator" role of the synapse workspace to the generated workspace linking MSI in Synapse studio portal before you submit job.
[ ]
[ ]
Attach Synapse spark pool as AML compute target
[ ]
Start an experiment run
Prepare data
[ ]
Tabular dataset as input
[ ]
File dataset as input
[ ]
Output config: the output will be registered as a File dataset
[ ]
Dataprep script
[ ]
[ ]
Set up Conda dependency for the following Script Run
[ ]
How to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster
[ ]
[ ]
How to leverage SynapseSparkStep in an AML pipeline to orchestrate data prep step on Synapse Spark and training step on AzureML compute.
[ ]
[ ]
[ ]
[ ]