WarpDrive for Multi-Agent Reinforcement Learning

NGC Catalog

CLASSIC

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Description

WarpDrive is a flexible, lightweight, and easy-to-use open-source reinforcement learning (RL) framework that implements end-to-end multi-agent RL on a single or multiple GPUs.

Publisher

Salesforce

Latest Tag

v1.1

Modified

April 4, 2023

Compressed Size

6.01 GB

Multinode Support

Multi-Arch Support

What is WarpDrive?

Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. WarpDrive also provides the auto scaling tools to achieve the optimal throughput per device (version 1.3), to perform the distributed asynchronous training among multiple GPU devices (version 1.4), to combine multiple GPU blocks for one environment replica (version 1.6). Together, these allow the user to run thousands of concurrent multi-agent simulations and train on extremely large batches of experience, achieving over 100x throughput over CPU-based counterparts.

Getting Started

System Requirements

You'll need to have a compatible Nvidia GPU (e.g., Tesla V100, A100).

Launch the WarpDrive Container

This will pull and run the WarpDrive base image.

docker pull nvcr.io/partners/salesforce/warpdrive:v1.0
docker run -ti --gpus all --rm nvcr.io/partners/salesforce/warpdrive:v1.0 bash

Install the WarpDrive from PIP

pip install "rl-warp-drive>=1.6.5"

Please note: constructing another virtual environment (e.g., virtualenv) under the docker is not recommended because it may shadow some configurations by the base container.

Testing

python -m warp_drive.utils.run_unittests
python -m warp_drive.utils.run_trainer_tests

Examples

Our current release includes several multi-agent environments based on the game of "Tag", where taggers are trying to run after and tag the runners. Supported by WarpDrive, several much more complex environments such as Covid-19 environment and climate change environment have been developed. For more details, you may refer to WarpDrive in GitHub and "Real World Problems and Collaborations".

AI Economist Covid Environment with WarpDrive: We train two-level multi-agent economic simulations using AI-Economist Foundation and train it using WarpDrive. We specifically consider the COVID-19 and economy simulation in this example.
Pytorch Lightning Trainer with WarpDrive: We provide a tutorial example and a blog article of a multi-agent reinforcement learning training loop with WarpDrive and Pytorch Lightning.
Climate Change Cooperation Competition collaborated with Mila. We provide the base version of the RICE (regional integrated climate environment) simulation environment.

Tag Games

Training

Below, we show multi-agent RL policies trained for different tagger:runner speed ratios using WarpDrive. These environments can run at millions of steps per second, and train in just a few hours, all on a single GPU!

Speed

We compare the training speed on an N1 16-CPU node versus a single A100 GPU (using WarpDrive), for the Tag environment with 100 runners and 5 taggers. With the same environment configuration and training parameters, WarpDrive on a GPU is 10× faster. Both scenarios are with 60 environment replicas running in parallel. Using more environments on the CPU node is infeasible as data copying gets too expensive. With WarpDrive, it is possible to scale up the number of environment replicas at least 10-fold, for even faster training.

Tutorials

We provide a collection of examples, use cases, and tutorials as Jupyter notebooks in our repository. For the complete and most up-to-date tutorials and examples, please visit GitHub link to check out: https://github.com/salesforce/warp-drive/tree/master/tutorials You can also find them via NGC Resources which include Basics, Sampler, Resetter and Training

Code Structure

WarpDrive provides a CUDA + Python framework and quality-of-life tools, so you can quickly build fast, flexible and massively distributed multi-agent RL systems. The following figure illustrates a bottoms-up overview of the design and components of WarpDrive. The user only needs to write a CUDA step function at the CUDA environment layer, while the rest is a pure Python interface. We have step-by-step tutorials for you to master the workflow.

Documentation

For the complete documentation and source codes, please refer to GitHub. For more information, please check out our blog, white paper, and code documentation.

License

An End User License Agreement is included with this product. By pulling and using this container, you accept the terms and conditions of this license. BSD-3-Clause license

Technical Support

If you're interested in extending this framework, or have questions, please visit our GitHub.

And you are welcome to join the AI Economist Slack channel using this invite link.