Pong Rllib

how-to-use-azuremlazure-mldata-sciencenotebookatari-on-distributed-computereinforcement-learningmachine-learningazure-machine-learningdeep-learningazuremlazure-ml-notebooksazure

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

Impressions

Reinforcement Learning in Azure Machine Learning - Pong problem

Reinforcement Learning in Azure Machine Learning is a managed service for running distributed reinforcement learning training and simulation using the open source Ray framework. This noteboook demonstrates how to use Ray to solve a more complex problem using a more complex setup including Ray RLLib running on multiple compute nodes and using a GPU. For this example we will train a Pong playing agent on cluster of two NC6 nodes (6 CPU, 1 GPU).

Pong problem

Pong is a two-dimensional sports game that simulates table tennis. The player controls an in-game paddle by moving it vertically across the left or right side of the screen. They can compete against another player controlling a second paddle on the opposing side. Players use the paddles to hit a ball back and forth.

Pong image
Fig 1. Pong game animation (from towardsdatascience.com).

The goal here is to train an agent to win an episode of Pong game against opponent with the score of at least 10 points. An episode in Pong runs until one of the players reaches a score of 21. Episodes are a terminology that is used across all the OpenAI gym environments that contains a strictly defined task.

Training a Pong agent is a compute-intensive task and this example demonstrates the use of Reinforcement Learning in Azure Machine Learning service to train an agent faster in a distributed, parallel environment. You'll learn more about using the head and the worker compute targets to train an agent in this notebook below.

Prerequisite

It is highly recommended that the user should go through the Reinforcement Learning in Azure Machine Learning - Cartpole Problem on Single Compute to understand the basics of Reinforcement Learning in Azure Machine Learning and Ray RLlib used in this notebook.

Azure Machine Learning SDK

Display the Azure Machine Learning SDK version.

[ ]

Get Azure Machine Learning workspace

Get a reference to an existing Azure Machine Learning workspace.

[ ]

Create Azure Machine Learning experiment

Create an experiment to track the runs in your workspace.

[ ]

Create compute target

In this example, we show how to set up a compute target for the Ray nodes.

Note that if you have an AzureML Data Scientist role, you will not have permission to create compute resources. Talk to your workspace or IT admin to create the compute targets described in this section, if they do not already exist.

[ ]

Create Azure ML Environment

This step creates and registers an Azure ML Environment that includes all of the dependencies needed to run this example, including CUDA drivers Pytorch, RLLib, and associated tools. This step can take a significant time (30 min) on the first run.

[ ]
[ ]

Create reinforcement learning training run

The code below submits the training run using a ScriptRunConfig. By providing the command to run the training, and a RunConfig object configured with your compute target, number of nodes, and environment image to use.

[ ]

Training configuration

All training parameters (including the Reinforcement Learning algorithm) are set through a single configuration file. For this example we'll be using the IMPALA algorithm to train an agent to play Atari Pong. We set num_workers to 11 because we have 11 CPUs available for worker nodes (6 CPUs on each of 2 machines, with 1 CPU consumed as a head node). We set episode_reward_mean (under stop) to 10 so that we terminate the run once we achieve a reward score of 10.

Here is the configuration we are using for this example:

	pong:
    env: ALE/Pong-v5
    run: IMPALA
    config:
        num_workers: 11
        num_gpus: 1
        rollout_fragment_length: 50
        train_batch_size: 1000
        num_sgd_iter: 2
        num_multi_gpu_tower_stacks: 2
        env_config:
            frameskip: 1
            full_action_space: false
            repeat_action_probability: 0.0
        stop:
          episode_reward_mean: 10
          total_time_s: 3600
        model:
          dim: 42


Monitor the run

Azure Machine Learning provides a Jupyter widget to show the status of an experiment run. You could use this widget to monitor the status of the runs. The widget shows the list of two child runs, one for head compute target run and one for worker compute target run. You can click on the link under Status to see the details of the child run. It will also show the metrics being logged.

[ ]

Stop the run

To stop the run, call training_run.cancel().

[ ]

Wait for completion

Wait for the run to complete before proceeding. If you want to stop the run, you may skip this and move to next section below.

Note: The run may take anywhere from 30 minutes to 45 minutes to complete.

[ ]

Performance of the agent during training

Let's get the reward metrics for the training run agent and observe how the agent's rewards improved over the training iterations and how the agent learns to win the Pong game.

Collect the episode reward metrics from the worker run's metrics.

[ ]

Plot the reward metrics.

[ ]

We observe that during the training over multiple episodes, the agent learns to win the Pong game against opponent with our target of 10 points in each episode of 21 points. Congratulations!! You have trained your Pong agent to win a game.

Cleaning up

For your convenience, below you can find code snippets to clean up any resources created as part of this tutorial that you don't wish to retain.

[ ]