Azure Spark Job On Synapse Spark Pool

Spark Job On Synapse Spark Pool

how-to-use-azuremlazure-mldata-sciencenotebookmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksazureazure-synapse

alph-notebooks/azure-ml-notebooks / spark_job_on_synapse_spark_pool.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Licensed under the MIT License.

Impressions

Using Synapse Spark Pool as a Compute Target from Azure Machine Learning Remote Run

To use Synapse Spark Pool as a compute target from Experiment Run, ScriptRunConfig is used, the same as other Experiment Runs. This notebook demonstrates how to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster.
To use Synapse Spark Pool as a compute target from Azure Machine Learning Pipeline, a SynapseSparkStep is used. This notebook demonstrates how to leverage SynapseSparkStep in Azure Machine Learning Pipeline.

Before you begin:

Create an Azure Synapse workspace, check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-workspace) for more information.
Create Spark Pool in Synapse workspace: check [this] (https://docs.microsoft.com/en-us/azure/synapse-analytics/quickstart-create-apache-spark-pool-portal) for more information.

Azure Machine Learning and Pipeline SDK-specific imports

[ ]

Link Synapse workspace to AML

You have to be an "Owner" of Synapse workspace resource to perform linking. You can check your role in the Azure resource management portal, if you don't have an "Owner" role, you can contact an "Owner" to link the workspaces for you.

[ ]

Linked service property

A MSI (system_assigned_identity_principal_id) will be generated for each linked service, for example:

name=synapselink,

type=Synapse,

linked_service_resource_id=/subscriptions/4faaaf21-663f-4391-96fd-47197c630979/resourceGroups/static_resources_synapse_test/providers/Microsoft.Synapse/workspaces/synapsetest2,

system_assigned_identity_principal_id=eb355d52-3806-4c5a-aec9-91447e8cfc2e

Make sure you grant "Synapse Apache Spark Administrator" role of the synapse workspace to the generated workspace linking MSI in Synapse studio portal before you submit job.

[ ]

Attach Synapse spark pool as AML compute target

[ ]

Start an experiment run

Prepare data

[ ]

Tabular dataset as input

[ ]

File dataset as input

[ ]

Output config: the output will be registered as a File dataset

[ ]

Dataprep script

[ ]

Set up Conda dependency for the following Script Run

[ ]

How to leverage ScriptRunConfig to submit an experiment run to an attached Synapse Spark cluster

[ ]

How to leverage SynapseSparkStep in an AML pipeline to orchestrate data prep step on Synapse Spark and training step on AzureML compute.

[ ]