Linux / arm64
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Automatic differentiation is done with a tape-based system at both a functional and neural network layer level. This functionality brings a high level of flexibility and speed as a deep learning framework and provides accelerated NumPy-like functionality. NGC Containers are the easiest way to get started with PyTorch. The PyTorch NGC Container comes with all dependencies included, providing an easy place to start developing common applications, such as conversational AI, natural language processing (NLP), recommenders, and computer vision.
The PyTorch NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.
Using the PyTorch NGC Container requires the host system to have the following installed:
For supported versions, see the Framework Containers Support Matrix and the NVIDIA Container Toolkit Documentation.
No other installation, compilation, or dependency management is required. It is not necessary to install the NVIDIA CUDA Toolkit.
The PyTorch NGC Container is optimized to run on NVIDIA DGX Foundry and NVIDIA DGX SuperPOD managed by NVIDIA Base Command Platform. Please refer to the Base Command Platform User Guide to learn more about running workloads on BCP clusters.
To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.
If you have Docker 19.03 or later, a typical command to launch the container is:
docker run --gpus all -it --rm nvcr.io/nvidia/pytorch:xx.xx-py3
If you have Docker 19.02 or earlier, a typical command to launch the container is:
nvidia-docker run -it --rm -v nvcr.io/nvidia/pytorch:xx.xx-py3
Where:
xx.xx
is the container version. For example, 22.01
.PyTorch is run by importing it as a Python module:
$ python
>>> import torch
>>> print(torch.cuda.is_available())
True
See /workspace/README.md
inside the container for information on getting started and customizing your PyTorch image.
You might want to pull in data and model descriptions from locations outside the container for use by PyTorch. To accomplish this, the easiest method is to mount one or more host directories as Docker bind mounts. For example:
docker run --gpus all -it --rm -v local_dir:container_dir nvcr.io/nvidia/pytorch:xx.xx-py3
Note: DIGITS uses shared memory to share data between processes. For example, if you use Torch multiprocessing for multi-threaded data loaders, the default shared memory segment size that the container runs with may not be enough. Therefore, you should increase the shared memory size by issuing either:
--ipc=host
or
--shm-size=
in the docker run
command.
Jobs using the Pytorch NGC Container on Base Command Platform clusters can be launched either by using the NGC CLI tool or by using the Base Command Platform Web UI. To use the NGC CLI tool, configure the Base Command Platform user, team, organization, and cluster information using the ngc config
command as described here.
An example command to launch the container on a single-GPU instance is:
ngc batch run --name "My-1-GPU-pytorch-job" --instance dgxa100.80g.1.norm --commandline "sleep infinity" --result /results --image "nvidia/pytorch:22.08-py3"
An example command to launch a two-node distributed job with a total runtime of 10 minutes (600 seconds) is:
ngc batch run --name "My-2-node-pytorch-job" --preempt RUNONCE --total-runtime 600s --instance dgxa100.80g.8.norm --commandline "sleep infinity" --result /results --array-type "PYTORCH" --replicas "2" --image "nvidia/pytorch:22.08-py3"
The PyTorch container includes JupyterLab in it and can be invoked as part of the job command for easy access to the container and exploring the capabilities of the container. Example to invoke JupyterLab as part of the job run on a single DGX node is:
ngc batch run --name "My-1-node-pytorch-jupyterlab-job" --instance dgxa100.80g.8.norm --commandline "jupyter lab --allow-root --ip=* --port=8888 --no-browser --NotebookApp.token='' --NotebookApp.allow_origin='*' --notebook-dir=/ & sleep infinity" --result /results --image "nvidia/pytorch:22.08-py3"
For the full list of contents, see the PyTorch Container Release Notes.
This container image contains the complete source of the version of PyTorch in /opt/pytorch. It is pre-built and installed in Conda default environment (/opt/conda/lib/python3.8/site-packages/torch/) in the container image. Visit pytorch.org to learn more about PyTorch.
The NVIDIA PyTorch Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration:
The software stack in this container has been validated for compatibility, and does not require any additional installation or compilation from the end user. This container can help accelerate your deep learning workflow from end to end.
NVIDIA Data Loading Library (DALI) is designed to accelerate data loading and preprocessing pipelines for deep learning applications by offloading them to the GPU. DALI primary focuses on building data preprocessing pipelines for image, video, and audio data. These pipelines are typically complex and include multiple stages, leading to bottlenecks when run on CPU. Use this container to get started on accelerating data loading with DALI.
RAPIDS is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPU. RAPIDS focuses on common data preparation tasks for analytics and data science. The RAPIDS API is built to mirror commonly used data processing libraries like pandas, thus providing massive speedups with minor changes to a preexisting codebase. Use this container to get started on accelerating your data science pipelines with RAPIDS.
NVIDIA CUDA Deep Neural Network Library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The version of PyTorch in this container is precompiled with cuDNN support, and does not require any additional configuration.
NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node communication primitives for NVIDIA GPUs and networking that take into account system and network topology. NCCL is integrated with PyTorch as a torch.distributed
backend, providing implementations for broadcast
, all_reduce
, and other algorithms.
TensorRT is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module.
For the latest Release Notes, see the PyTorch Release Notes.
For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.
For more information about PyTorch, including tutorials, documentation, and examples, see:
To review known CVEs on this image, refer to the Security Scanning tab on this page.
By pulling and using the container, you accept the terms and conditions of this End User License Agreement.