NGC | Catalog
CatalogContainersNVIDIA NPN Workshop: Scaling Data Loading with DALI

NVIDIA NPN Workshop: Scaling Data Loading with DALI

Logo for NVIDIA NPN Workshop: Scaling Data Loading with DALI
Description
This container contains notebooks to be used alongside an instructor-led NVIDIA DALI workshop and may not contain an up to date DALI version. If you want to use DALI with an NGC container, please check the latest Tensorflow, MxNet, and PyTorch NGC containers - which all contain it, or just download DALI directly.
Publisher
NVIDIA
Latest Tag
latest
Modified
April 10, 2024
Compressed Size
4.45 GB
Multinode Support
No
Multi-Arch Support
No
latest (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Scaling Data Loading with DALI

Note: This container contains notebooks to be used alongside an instructor-led NVIDIA DALI workshop and may contain not an up to date DALI version. If you want to use DALI with an NGC container, please check the latest Tensorflow, PyTorch, and MxNet NGC containers - which all contain it, or just download DALI directly .

This container contains a demonstration of how you can use the new DALI functional API to accelerate and scale an image data loading pipeline to dramatically accelerate deep learning workflows.

The data pathway is an often overlooked component of the deep learning workflow. While we often think of most of the computation happening in the actual deep learning model itself, it turns out that as you scale deep learning workloads across GPUs your CPU becomes increasingly burdened with preprocessing and feeding data to your models.

DALI lets you GPU accelerate image loading, jpeg decoding, data reshaping and resizing, and a variety of data augmentation techniques. This container shows off how you can use these to adapt a PyTorch workflow using the normal PyTorch dataloaders to a fully GPU-Accelerated DALI workflow.

This tutorial uses a simple convolutional classifier over sample 2D natural images from the Youtube-BB Dataset

Why DALI?

NVIDIA DALI - DAta Loading LIbrary - is an Open Source Software (OSS) GPU accelerated library for data loading and augmentation. You can find it on Github here: NVIDIA Data Loading Library (DALI). In this example, we primarily use it for 2D images in computer vision tasks.

DALI is unique in that it allows you to potentially do every step of the data loading and transforming process on GPU by composing pipelines of exclusively GPU ops (DALI does support CPU ops, but they must come before the GPU accelerated part of the pipeline). If every step of the data loading process up until training is all on GPU, it saves having to do CPU-GPU communication. So that is what DALI does - it helps build and optimize the "GPU onramp" to a deep learning model.

To learn more

Please review the following resources:

Installation and Getting Started

Getting started with the application is pretty straightforward with nvidia-docker.

Running from NGC container

This image contains the complete DALI Jupyter Lab environment and tutorial.

1. Download the container from NGC

docker pull nvcr.io/nvstaging/npn/npn_workshop:latest

2. Run the notebook server

docker run --gpus all --net=host -it -v $(pwd):/workspace/pwd nvcr.io/nvstaging/npn/npn_workshop:latest

Note: Depending on your docker version you may have remove --gpus all or add --runtime=nvidia

3. Connect to notebook server

Jupyter Lab will be available on port 8888!

e.g. http://127.0.0.1:8888 if running on a local machine

(or first available port after that, 8889, 8890 etc if 8888 is occupied - see command output)

4. Run the notebooks

The Jupyter Lab server will open to the first page of the notebook ’yt_easy’. The remaining tabs, ‘yt_medium’ and ‘yt_hard’ contain the remaining two sections of the tutorial. These sections cover:

  • yt_easy: example of a simple non-DALI workflow - how you might approach data loading in PyTorch

  • yt_medium: example of a simple DALI workflow - including how to enable DALI with Automatic Mixed Precision and Data Parallelism enabled

  • yt_hard: Demonstrates how to do 'sharded' DALI - where DALI runs on multiple GPUs simultaneously - in a Distributed Data Parallel workload. Also demonstrates how PyTorch Distributed workloads can be launched by Jupyter!