Train In Spark
Copyright (c) Microsoft Corporation. All rights reserved.
Licensed under the MIT License.
![]()
05. Train in Spark
- Create Workspace
- Create Experiment
- Copy relevant files to the script folder
- Configure and Run
Prerequisites
If you are using an Azure Machine Learning Notebook VM, you are all set. Otherwise, go through the configuration Notebook first if you haven't already to establish your connection to the AzureML Workspace.
Initialize Workspace
Initialize a workspace object from persisted configuration.
Create Experiment
View train-spark.py
For convenience, we created a training script for you. It is printed below as a text, but you can also run %pfile ./train-spark.py in a cell to show the file.
Configure & Run
Note You can use Docker-based execution to run the Spark job in local computer or a remote VM. Please see the train-in-remote-vm notebook for example on how to configure and run in Docker mode in a VM. Make sure you choose a Docker image that has Spark installed, such as microsoft/mmlspark:0.12.
Attach an HDI cluster
Here we will use a actual Spark cluster, HDInsight for Spark, to run this job. To use HDI commpute target:
- Create a Spark for HDI cluster in Azure. Here are some quick instructions. Make sure you use the Ubuntu flavor, NOT CentOS.
- Enter the IP address, username and password below
Configure HDI run
Configure an execution using the HDInsight cluster with a conda environment that has numpy.
Submit the script to HDI
Monitor the run using a Juypter widget
Note: if you need to cancel a run, you can follow these instructions.
After the run is succesfully finished, you can check the metrics logged.