Azure Train In Spark

Train In Spark

how-to-use-azuremlazure-mldata-sciencenotebooktrain-in-sparkmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksazuretraining

alph-notebooks/azure-ml-notebooks / train-in-spark.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Licensed under the MIT License.

Impressions

05. Train in Spark

Create Workspace
Create Experiment
Copy relevant files to the script folder
Configure and Run

Prerequisites

If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the configuration Notebook first if you haven't already to establish your connection to the AzureML Workspace.

[ ]

Initialize Workspace

Initialize a workspace object from persisted configuration.

[ ]

Create Experiment

[ ]

View `train-spark.py`

For convenience, we created a training script for you. It is printed below as a text, but you can also run %pfile ./train-spark.py in a cell to show the file.

[ ]

Configure & Run

Note You can use Docker-based execution to run the Spark job in local computer or a remote VM. Please see the train-in-remote-vm notebook for example on how to configure and run in Docker mode in a VM. Make sure you choose a Docker image that has Spark installed, such as microsoft/mmlspark:0.12.

Attach an HDI cluster

Here we will use a actual Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:

Create a Spark for HDI cluster in Azure. Here are some quick instructions. Make sure you use the Ubuntu flavor, NOT CentOS.
Enter the IP address, username and password below

[ ]

Configure HDI run

Configure an execution using the HDInsight cluster with a conda environment that has numpy.

[ ]

Submit the script to HDI

[ ]

Monitor the run using a Juypter widget

[ ]

Note: if you need to cancel a run, you can follow these instructions.

After the run is succesfully finished, you can check the metrics logged.

[ ]