Scaling Data Loading with DALI
Note: This container contains notebooks to be used alongside an instructor-led NVIDIA DALI workshop and may contain not an up to date DALI version. If you want to use DALI with an NGC container, please check the latest Tensorflow, PyTorch, and MxNet NGC containers - which all contain it, or just download DALI directly .
This container contains a demonstration of how you can use the new DALI functional API to accelerate and scale an image data loading pipeline to dramatically accelerate deep learning workflows.
The data pathway is an often overlooked component of the deep learning workflow. While we often think of most of the computation happening in the actual deep learning model itself, it turns out that as you scale deep learning workloads across GPUs your CPU becomes increasingly burdened with preprocessing and feeding data to your models.
DALI lets you GPU accelerate image loading, jpeg decoding, data reshaping and resizing, and a variety of data augmentation techniques. This container shows off how you can use these to adapt a PyTorch workflow using the normal PyTorch dataloaders to a fully GPU-Accelerated DALI workflow.
This tutorial uses a simple convolutional classifier over sample 2D natural images from the Youtube-BB Dataset
NVIDIA DALI - DAta Loading LIbrary - is an Open Source Software (OSS) GPU accelerated library for data loading and augmentation. You can find it on Github here: NVIDIA Data Loading Library (DALI). In this example, we primarily use it for 2D images in computer vision tasks.
DALI is unique in that it allows you to potentially do every step of the data loading and transforming process on GPU by composing pipelines of exclusively GPU ops (DALI does support CPU ops, but they must come before the GPU accelerated part of the pipeline). If every step of the data loading process up until training is all on GPU, it saves having to do CPU-GPU communication. So that is what DALI does - it helps build and optimize the "GPU onramp" to a deep learning model.
To learn more
Please review the following resources:
- [The DALI Documentation] (https://docs.nvidia.com/deeplearning/dali/user-guide/docs/index.html)
- [Fast AI Data Preprocessing with NVIDIA DALI Developer Blog] (https://devblogs.nvidia.com/fast-ai-data-preprocessing-with-nvidia-dali/)
- [Fast Data Pre-Processing with NVIDIA DALI GTC Talk] (https://developer.nvidia.com/gtc/2020/video/s21139-vid)
Installation and Getting Started
Getting started with the application is pretty straightforward with nvidia-docker.
Running from NGC container
This image contains the complete DALI Jupyter Lab environment and tutorial.
1. Download the container from NGC
docker pull nvcr.io/nvstaging/npn/npn_workshop:latest
2. Run the notebook server
docker run --gpus all --net=host -it -v $(pwd):/workspace/pwd nvcr.io/nvstaging/npn/npn_workshop:latest
Note: Depending on your docker version you may have remove
--gpus all or add
3. Connect to notebook server
Jupyter Lab will be available on port 8888!
e.g. http://127.0.0.1:8888 if running on a local machine
(or first available port after that, 8889, 8890 etc if 8888 is occupied - see command output)
4. Run the notebooks
The Jupyter Lab server will open to the first page of the notebook ’yt_easy’. The remaining tabs, ‘yt_medium’ and ‘yt_hard’ contain the remaining two sections of the tutorial. These sections cover:
yt_easy: example of a simple non-DALI workflow - how you might approach data loading in PyTorch
yt_medium: example of a simple DALI workflow - including how to enable DALI with Automatic Mixed Precision and Data Parallelism enabled
yt_hard: Demonstrates how to do 'sharded' DALI - where DALI runs on multiple GPUs simultaneously - in a Distributed Data Parallel workload. Also demonstrates how PyTorch Distributed workloads can be launched by Jupyter!