GPU Accelerated ML workflows with RAPIDS

Demonstration of GPU Accelerated Machine Learning Data Science workflows using RAPIDS.



September 1, 2023

3.86 GB

Accelerated Data Science with RAPIDS

This container provides a demonstration of GPU Accelerated Data Science workflows using RAPIDS.


The RAPIDS suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs. Licensed under Apache 2.0, RAPIDS is incubated by NVIDIA® based on extensive hardware and data science science experience. RAPIDS utilizes NVIDIA CUDA® primitives for low-level compute optimization, and exposes GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

RAPIDS also focuses on common data preparation tasks for analytics and data science. This includes a familiar dataframe API that integrates with a variety of machine learning algorithms for end-to-end pipeline accelerations without paying typical serialization costs. RAPIDS also includes support for multi-node, multi-GPU deployments, enabling vastly accelerated processing and training on much larger dataset sizes.

To learn more

Please review the following resources:

Installation and Getting Started

Getting started with the application is pretty straightforward with nvidia-docker.

Running from NGC container

This image contains the complete RAPIDS Jupyter Lab environment and tutorial.

1. Download the container from NGC

docker pull

2. Run the notebook server

docker run --gpus all --rm -it -p 8888:8888

Note: Depending on your docker version you may have to use ‘docker run --runtime=nvidia’ or remove ‘--gpus all’

3. Connect to notebook server

Jupyter Lab will be available on port 8888!

e.g. if running on a local machine

(or first available port after that, 8889, 8890 etc if 8888 is occupied - see command output)

4. Run the notebooks

  • utils/data_loader: Loads 1 year of airline dataset for ML100 and 10 years for ML200.
  • ML100/1_ML100-gpu: ML100 Data Science workflow on single GPU with RAPIDS.
  • ML100/2_ML100-cpu: ML100 Data Science workflow on CPU for comparison.
  • ML200/ML200: Multi-GPU Data Science Workflow with large dataset using RAPIDS and Dask.

Getting Help & Support

