This is the second tutorial on WarpDrive, a PyCUDA-based framework for extremely parallelized multi-agent reinforcement learning (RL) on a single graphics processing unit (GPU). At this stage, we assume you have read our first tutorial on WarpDrive basics.
In this tutorial, we describe
CUDASampler, a lightweight and fast action sampler based on the policy distribution across several RL agents and environment replicas.
CUDASampler utilizes the GPU to parallelize operations to efficiently sample a large number of actions in parallel.
- It reads the distribution on the GPU through Pytorch and samples actions exclusively at the GPU. There is no data transfer.
- It maximizes parallelism down to the individual thread level, i.e., each agent at each environment has its own random seed and independent random sampling process.
- It runs much faster than most GPU samplers. For example, it is significantly faster than Pytorch.