This is the second tutorial on WarpDrive, a PyCUDA-based framework for extremely parallelized multi-agent reinforcement learning (RL) on a single graphics processing unit (GPU). At this stage, we assume you have read our first tutorial on WarpDrive basics.

In this tutorial, we describe `CUDASampler`, a lightweight and fast action sampler based on the policy distribution across several RL agents and environment replicas. `CUDASampler` utilizes the GPU to parallelize operations to efficiently sample a large number of actions in parallel.

Notably:

- It reads the distribution on the GPU through Pytorch and samples actions exclusively at the GPU. There is no data transfer.
- It maximizes parallelism down to the individual thread level, i.e., each agent at each environment has its own random seed and independent random sampling process.

warp_drive_sampler_numba

Tutorial covering the basics of WarpDrive Sampler using Numba

WarpDrive Sampler for Numba