Using the extreme parallelization capability of GPUs, WarpDrive enables orders-of-magnitude faster RL compared to CPU simulation + GPU model implementations. It is extremely efficient as it avoids back-and-forth data copying between the CPU and the GPU, and runs simulations across multiple agents and multiple environment replicas in parallel. WarpDrive also provides the auto scaling tools to achieve the optimal throughput per device (version 1.3), to perform the distributed asynchronous training among multiple GPU devices (version 1.4), to combine multiple GPU blocks for one environment replica (version 1.6). Together, these allow the user to run thousands of concurrent multi-agent simulations and train on extremely large batches of experience, achieving over 100x throughput over CPU-based counterparts.
You'll need to have a compatible Nvidia GPU (e.g., Tesla V100, A100).
This will pull and run the WarpDrive base image.
docker pull nvcr.io/partners/salesforce/warpdrive:v1.0
docker run -ti --gpus all --rm nvcr.io/partners/salesforce/warpdrive:v1.0 bash
pip install "rl-warp-drive>=1.6.5"
Please note: constructing another virtual environment (e.g., virtualenv) under the docker is not recommended because it may shadow some configurations by the base container.
python -m warp_drive.utils.run_unittests
python -m warp_drive.utils.run_trainer_tests
Our current release includes several multi-agent environments based on the game of "Tag", where taggers are trying to run after and tag the runners. Supported by WarpDrive, several much more complex environments such as Covid-19 environment and climate change environment have been developed. For more details, you may refer to WarpDrive in GitHub and "Real World Problems and Collaborations".
Below, we show multi-agent RL policies trained for different tagger:runner speed ratios using WarpDrive. These environments can run at millions of steps per second, and train in just a few hours, all on a single GPU!
We compare the training speed on an N1 16-CPU node versus a single A100 GPU (using WarpDrive), for the Tag environment with 100 runners and 5 taggers. With the same environment configuration and training parameters, WarpDrive on a GPU is 10× faster. Both scenarios are with 60 environment replicas running in parallel. Using more environments on the CPU node is infeasible as data copying gets too expensive. With WarpDrive, it is possible to scale up the number of environment replicas at least 10-fold, for even faster training.
We provide a collection of examples, use cases, and tutorials as Jupyter notebooks in our repository. For the complete and most up-to-date tutorials and examples, please visit GitHub link to check out: https://github.com/salesforce/warp-drive/tree/master/tutorials You can also find them via NGC Resources which include Basics, Sampler, Resetter and Training
WarpDrive provides a CUDA + Python framework and quality-of-life tools, so you can quickly build fast, flexible and massively distributed multi-agent RL systems. The following figure illustrates a bottoms-up overview of the design and components of WarpDrive. The user only needs to write a CUDA step function at the CUDA environment layer, while the rest is a pure Python interface. We have step-by-step tutorials for you to master the workflow.
For the complete documentation and source codes, please refer to GitHub. For more information, please check out our blog, white paper, and code documentation.
An End User License Agreement is included with this product. By pulling and using this container, you accept the terms and conditions of this license. BSD-3-Clause license
If you're interested in extending this framework, or have questions, please visit our GitHub.
And you are welcome to join the AI Economist Slack channel using this invite link.