NVIDIA
NVIDIA
PyTorch
Container
NVIDIA
NVIDIA
PyTorch

PyTorch is a GPU accelerated tensor computational framework. Functionality can be extended with common Python libraries such as NumPy and SciPy. Automatic differentiation is done with a tape-based system at the functional and neural network layer levels.

PyTorch

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Automatic differentiation is done with a tape-based system at both a functional and neural network layer level. This functionality brings a high level of flexibility and speed as a deep learning framework and provides accelerated NumPy-like functionality. NGC Containers are the easiest way to get started with PyTorch. The PyTorch NGC Container comes with all dependencies included, providing an easy place to start developing common applications, such as conversational AI, natural language processing (NLP), recommenders, and computer vision.

The PyTorch NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container also contains software for accelerating ETL (DALI, Training (cuDNN, NCCL), and Inference (TensorRT) workloads.

Prerequisites

Using the PyTorch NGC Container requires the host system to have the following installed:

For supported versions, see the Framework Containers Support Matrix and the NVIDIA Container Toolkit Documentation.

No other installation, compilation, or dependency management is required. It is not necessary to install the NVIDIA CUDA Toolkit.

The PyTorch NGC Container is optimized to run on NVIDIA DGX Foundry and NVIDIA DGX SuperPOD managed by NVIDIA Base Command Platform. Please refer to the Base Command Platform User Guide to learn more about running workloads on BCP clusters.

Running PyTorch Using Docker

To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.

If you have Docker 19.03 or later, a typical command to launch the container is:

docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:xx.xx-py3

If you have Docker 19.02 or earlier, a typical command to launch the container is:

nvidia-docker run -it --rm -v nvcr.io/nvidia/pytorch:xx.xx-py3

Where:

  • xx.xx is the container version. For example, 22.01.

PyTorch is run by importing it as a Python module:

$ python
>>> import torch
>>> print(torch.cuda.is_available())
True

See /workspace/README.md inside the container for information on getting started and customizing your PyTorch image.

You might want to pull in data and model descriptions from locations outside the container for use by PyTorch. To accomplish this, the easiest method is to mount one or more host directories as Docker bind mounts. For example:

docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/pytorch:xx.xx-py3

Running PyTorch Using Base Command Platform

Jobs using the Pytorch NGC Container on Base Command Platform clusters can be launched either by using the NGC CLI tool or by using the Base Command Platform Web UI. To use the NGC CLI tool, configure the Base Command Platform user, team, organization, and cluster information using the ngc config command as described here.

An example command to launch the container on a single-GPU instance is:

ngc batch run --name "My-1-GPU-pytorch-job" --instance dgxa100.80g.1.norm --commandline "sleep infinity" --result /results --image "nvidia/pytorch:22.08-py3"

An example command to launch a two-node distributed job with a total runtime of 10 minutes (600 seconds) is:

ngc batch run --name "My-2-node-pytorch-job" --preempt RUNONCE --total-runtime 600s --instance dgxa100.80g.8.norm --commandline "sleep infinity" --result /results --array-type "PYTORCH" --replicas "2" --image "nvidia/pytorch:22.08-py3"

The PyTorch container includes JupyterLab in it and can be invoked as part of the job command for easy access to the container and exploring the capabilities of the container. Example to invoke JupyterLab as part of the job run on a single DGX node is:

ngc batch run --name "My-1-node-pytorch-jupyterlab-job" --instance dgxa100.80g.8.norm --commandline "jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/ & sleep infinity" --result /results --image "nvidia/pytorch:22.08-py3"

What Is In This Container?

For the full list of contents, see the PyTorch Container Release Notes.

This container image contains the complete source of the version of PyTorch in /opt/pytorch. It is pre-built and installed in Conda default environment (/opt/conda/lib/python3.8/site-packages/torch/) in the container image. Visit pytorch.org to learn more about PyTorch.

The NVIDIA PyTorch Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration:

The software stack in this container has been validated for compatibility, and does not require any additional installation or compilation from the end user. This container can help accelerate your deep learning workflow from end to end.

Link to Open Source Code

ETL

NVIDIA Data Loading Library (DALI) is designed to accelerate data loading and preprocessing pipelines for deep learning applications by offloading them to the GPU. DALI primary focuses on building data preprocessing pipelines for image, video, and audio data. These pipelines are typically complex and include multiple stages, leading to bottlenecks when run on CPU. Use this container to get started on accelerating data loading with DALI.

Training

NVIDIA CUDA Deep Neural Network Library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The version of PyTorch in this container is precompiled with cuDNN support, and does not require any additional configuration.

NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node communication primitives for NVIDIA GPUs and networking that take into account system and network topology. NCCL is integrated with PyTorch as a torch.distributed backend, providing implementations for broadcast, all_reduce, and other algorithms.

Inference

TensorRT is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module.

Suggested Reading

For the latest Release Notes, see the PyTorch Release Notes.

For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.

For more information about PyTorch, including tutorials, documentation, and examples, see:

Security Common Vulnerabilities and Exposures (CVEs)

Please review the Security Scanning tab on NGC to view the latest security scan results. For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning tab.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal developer team to ensure this container meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

License

By pulling and using the container, you accept the terms and conditions of this End User License Agreement and Product-Specific Terms.

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.