Azure Train On Local

Train On Local

how-to-use-azuremlazure-mldata-sciencenotebooktrain-on-localmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksazuretraining

alph-notebooks/azure-ml-notebooks / train-on-local.ipynb

Export

Run Notebooks

Contents

No cells yet

Add cells to see them here

Licensed under the MIT License.

Impressions

02. Train locally

Train a model locally: Directly on your machine and within a Docker container

Introduction
Pre-requisites
Initialize Workspace
Create An Experiment
View training and auxiliary scripts
Configure & Run
1. User-managed environment
  1. Set the environment up
  2. Submit the script to run in the user-managed environment
  3. Get run history details
2. System-managed environment
  1. Set the environment up
  2. Submit the script to run in the system-managed environment
  3. Get run history details
3. Docker-based execution
  1. Set the environment up
  2. Submit the script to run in the system-managed environment
  3. Get run history details
  4. Use a custom Docker image
Query run metrics

1. Introduction

In this notebook, we will learn how to:

Connect to our AML workspace
Create or load a workspace
Configure & execute a local run in:
- a user-managed Python environment
- a system-managed Python environment
- a Docker environment
Query run metrics to find the best model trained in the run
Register that model for operationalization

2. Pre-requisites

In this notebook, we assume that you have set your Azure Machine Learning workspace. If you have not, make sure you go through the configuration notebook first. In the end, you should have configuration file that contains the subscription ID, resource group and name of your workspace.

[ ]

3. Initialize Workspace

Initialize your workspace object from configuration file

[ ]

4. Create An Experiment

An experiment is a logical container in an Azure ML Workspace. It contains a series of trials called Runs. As such, it hosts run records such as run metrics, logs, and other output artifacts from your experiments.

[ ]

5. View training and auxiliary scripts

For convenience, we already created the training (train.py) script and supportive libraries (mylib.py) for you. Take a few minutes to examine both files.

[ ]

6. Configure & Run

6.A User-managed environment

6.A.a Set the environment up

When using a user-managed environment, you are responsible for ensuring that all the necessary packages are available in the Python environment you choose to run the script in.

[ ]

6.A.b Submit the script to run in the user-managed environment

Whatever the way you manage your environment, you need to use the ScriptRunConfig class. It allows you to further configure your run by pointing to the train.py script and to the working directory, which also contains the mylib.py file. These inputs indeed provide the commands to execute in the run. Once the run is configured, you submit it to your experiment.

[ ]

6.A.c Get run history details

While all calculations were run on your machine (cf. below), by using a run you also captured the results of your calculations into your run and experiment. You can then see them on the Azure portal, through the link displayed as output of the following cell.

Note: The recording of the computation results into your run was made possible by the run.log() commands in the train.py file.

[ ]

Note: if you need to cancel a run, you can follow these instructions.

Block any execution to wait until the run finishes.

[ ]

Note: All these calculations were run on your local machine, in the conda environment you defined above. You can find the results in:

~/.azureml/envs/azureml_xxxx for the conda environment you just created
~/AppData/Local/Temp/azureml_runs/train-on-local_xxxx for the machine learning models you trained (this path may differ depending on the platform you use). This folder also contains
- Logs (under azureml_logs/)
- Output pickled files (under outputs/)
- The configuration files (credentials, local and docker image setups)
- The train.py and mylib.py scripts
- The current notebook

Take a few minutes to examine the output of the cell above. It shows the content of some of the log files, and extra information on the conda environment used.

6.B System-managed environment

6.B.a Set the environment up

Now, instead of managing the setup of the environment yourself, you can ask the system to build a new conda environment for you. The environment is built once, and will be reused in subsequent executions as long as the conda dependencies remain unchanged.

[ ]

6.B.b Submit the script to run in the system-managed environment

A new conda environment is built based on the conda dependencies object. If you are running this for the first time, this might take up to 5 minutes.

The commands used to execute the run are then the same as the ones you used above.

[ ]

6.B.c Get run history details

[ ]

6.C Docker-based execution

In this section, you will train the same models, but you will do so in a Docker container, on your local machine. For this, you then need to have the Docker engine installed locally. If you don't have it yet, please follow the instructions below.

How to install Docker

Linux
MacOs
Windows

In case of issues, troubleshooting documentation can be found here. Additionally, you can follow the steps below, if Virtualization is not enabled on your machine:
- Go to Task Manager > Performance
- Check that Virtualization is enabled
- If it is not, go to Start > Settings > Update and security > Recovery > Advanced Startup - Restart now > Troubleshoot > Advanced options > UEFI firmware settings - restart
- In the BIOS, go to Advanced > System options > Click the "Virtualization Technology (VTx)" only > Save > Exit > Save all changes -- This will restart the machine

Notes:

If your kernel is already running in a Docker container, such as Azure Notebooks, this mode will NOT work.
If you use a GPU base image, it needs to be used on Microsoft Azure Services such as ACI, AML Compute, Azure VMs, or AKS.

You can also ask the system to pull down a Docker image and execute your scripts in it.

6.C.a Set the environment up

In the cell below, you will configure your run to execute in a Docker container. It will:

run on a CPU
contain a conda environment in which the scikit-learn library will be installed.

As before, you will finish configuring your run by pointing to the train.py and mylib.py files.

[ ]

6.C.b Submit the script to run in the system-managed environment

The run is now configured and ready to be executed in a Docker container. If you are running this for the first time, the Docker container will get created, as well as the conda environment inside it. This will take several minutes. Once all this is generated, however, this conda environment will be reused as long as you don't change the conda dependencies.

[ ]

Potential issue on Windows and how to solve it

If you are using a Windows machine, the creation of the Docker image may fail, and you may see the following error message docker: Error response from daemon: Drive has not been shared. Failed to launch docker container. Check that docker is running and that C:\ on Windows and /tmp elsewhere is shared.

This is because the process above tries to create a linux-based, i.e. non-windows-based, Docker image. To fix this, you can:

Open the Docker user interface
Navigate to Settings > Shared drives
Select C (or both C and D, if you have one)
Apply

When this is done, you can try and re-run the command above.

6.C.c Get run history details

[ ]

The results obtained here should be the same as those obtained before. However, take a look at the "Execution summary" section in the output of the cell above. Look for "docker". There, you should see the "enabled" field set to True. Compare this to the 2 prior runs ("enabled" was then set to False).

6.C.d Use a custom Docker image

You can also specify a custom Docker image, if you don't want to use the default image provided by Azure ML.

	custom_docker_env = Environment("custom-docker-env")
custom_docker_env.docker.enabled = True

You can either pull an image directly from Anaconda:

	# Use an image available in Docker Hub without authentication
custom_docker_env.docker.base_image = "continuumio/miniconda3"

Or one of the images you may already have created:

	# or, use an image available in your private Azure Container Registry
custom_docker_env.docker.base_image = "mycustomimage:1.0"
custom_docker_env.docker.base_image_registry.address = "myregistry.azurecr.io"
custom_docker_env.docker.base_image_registry.username = "username"
custom_docker_env.docker.base_image_registry.password = "password"

Where to find my Docker image name and registry credentials

	If you do not know what the name of your Docker image or container registry is, or if you don't know how to access the username and password needed above, proceed as follows:
- Docker image name:
    - In the portal, under your resource group, click on your current workspace
    - Click on Experiments
    - Click on Images
    - Click on the image of your choice
    - Copy the "ID" string
    - In this notebook, replace "mycustomimage:1/0" with that ID string
- Username and password:
    - In the portal, under your resource group, click on the container registry associated with your workspace
        - If you have several and don't know which one you need, click on your workspace, go to Overview and click on the "Registry" name on the upper right of the screen
    - There, go to "Access keys"
    - Copy the username and one of the passwords
    - In this notebook, replace "username" and "password" by these values

In any case, you will need to use the lines above in place of the line marked as # Reference Docker image in section 6.C.a.

When you are using your custom Docker image, you might already have your Python environment properly set up. In that case, you can skip specifying conda dependencies, and just use the user_managed_dependencies option instead:

	custom_docker_env.python.user_managed_dependencies = True
# path to the Python environment in the custom Docker image
custom_docker_env.python.interpreter_path = '/opt/conda/bin/python'

Once you are done defining your environment, set that environment on your run configuration:

	src.run_config.environment = custom_docker_env

7. Query run metrics

Once your run has completed, you can now extract the metrics you captured by using the get_metrics method. As shown in the train.py file, these metrics are "alpha" and "mse".

[ ]

Let's find the model that has the lowest MSE value logged.

[ ]

Let's compare it to the others

[ ]

You can also list all the files that are associated with this run record

[ ]

From the results obtained above, ridge_0.40.pkl is the best performing model. You can now register that particular model with the workspace. Once you have done so, go back to the portal and click on "Models". You should see it there.

[ ]

You can now deploy your model by following this example.