NGC | Catalog


Logo for PaddlePaddle
PaddlePaddle is the first independent R&D deep learning platform in China. It has been widely adopted by manufacturing, agriculture, enterprise service, serving 4 million + developers, 157,000 companies and generating 476,000 models.
Latest Tag
April 1, 2024
Compressed Size
4.9 GB
Multinode Support
Multi-Arch Support
24.03-py3 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.


PaddlePaddle, as the first independent R&D deep learning platform in China, has been officially open-sourced to professional communities since 2016. It is an industrial platform with advanced technologies and rich features that cover core deep learning frameworks, basic model libraries, end-to-end development kits, tools & components as well as service platforms. NGC Containers are the easiest way to get started with PaddlePaddle. The PaddlePaddle NGC Container comes with all dependencies included, providing an easy place to start developing common applications, such as computer vision and natural language processing (NLP).

The PaddlePaddle NGC Container is optimized for GPU acceleration, and contains a validated set of libraries that enable and optimize GPU performance. This container may also contain modifications to the PaddlePaddle source code in order to maximize performance and compatibility. This container also contains software for accelerating ETL (DALI, RAPIDS), Training (cuDNN, NCCL), and Inference (TensorRT) workloads.


Using the PaddlePaddle NGC Container requires the host system to have the following installed:

For supported versions, see the Framework Containers Support Matrix and the NVIDIA Container Toolkit Documentation.

No other installation, compilation, or dependency management is required. It is not necessary to install the NVIDIA CUDA Toolkit.

Running PaddlePaddle

To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers For Deep Learning Frameworks User’s Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.

If you have Docker 19.03 or later, a typical command to launch the container is:

docker run --gpus all --shm-size=1g --ulimit memlock=-1 -it --rm

If you have Docker 19.02 or earlier, a typical command to launch the container is:

nvidia-docker run --shm-size=1g --ulimit memlock=-1 -it --rm


  • xx.xx is the container version. For example, 22.05.

PaddlePaddle is run by importing it as a Python module:

$ python -c 'import paddle; paddle.utils.run_check()'
Running verify PaddlePaddle program ...
W0516 06:36:54.208734   442] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 11.7, Runtime API Version: 11.7
W0516 06:36:54.212574   442] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
W0516 06:37:12.706600   442] Find all_reduce operators: 2. To make the speed faster, some all_reduce ops are fused during training, after fusion, the number of all_reduce ops is 2.
PaddlePaddle works well on 8 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

See /workspace/ inside the container for information on getting started and customizing your PaddlePaddle image.

You might want to pull in data and model descriptions from locations outside the container for use by PaddlePaddle. To accomplish this, the easiest method is to mount one or more host directories as Docker bind mounts. For example:

docker run --gpus all -it --rm -v local_dir:container_dir

Note: In order to share data between ranks, NCCL may require shared system memory for IPC and pinned (page-locked) system memory resources. The operating system's limits on these resources may need to be increased accordingly. Refer to your system's documentation for details. In particular, Docker containers default to limited shared and pinned memory resources. When using NCCL inside a container, it is recommended that you increase these resources by issuing:

--shm-size=1g --ulimit memlock=-1

in the docker run command.

What Is in This Container?

For the full list of contents, see the PaddlePaddle Container Release Notes.

This container image contains the complete source of the NVIDIA version of PaddlePaddle in /opt/paddle/paddle. It is prebuilt and installed as a system Python module. Visit to learn more about PaddlePaddle.

The NVIDIA PaddlePaddle Container is optimized for use with NVIDIA GPUs, and contains the following software for GPU acceleration:

The software stack in this container has been validated for compatibility, and does not require any additional installation or compilation from the end user. This container can help accelerate your deep learning workflow from end to end.


NVIDIA Data Loading Library (DALI) is designed to accelerate data loading and preprocessing pipelines for deep learning applications by offloading them to the GPU. DALI primary focuses on building data preprocessing pipelines for image, video, and audio data. These pipelines are typically complex and include multiple stages, leading to bottlenecks when run on CPU. Use this container to get started on accelerating data loading with DALI.


NVIDIA CUDA Deep Neural Network Library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The version of PaddlePaddle in this container is precompiled with cuDNN support, and does not require any additional configuration.

NVIDIA Collective Communications Library (NCCL) implements multi-GPU and multi-node communication primitives for NVIDIA GPUs and Networking that take into account system and network topology. NCCL is integrated with PaddlePaddle to accelerate training on multi-GPU and multi-node systems. In particular, NCCL provides the default all-reduce algorithm for the Mirrored and MultiWorkerMirrored distributed training strategies.


TensorRT is an SDK for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. PaddlePaddle integration with TensorRT (Paddle-TRT) optimizes and executes compatible subgraphs, allowing PaddlePaddle to execute the remaining graph. While you can still use PaddlePaddle's wide and flexible feature set, TensorRT will parse the model and apply optimizations to the portions of the graph wherever possible.

Suggested Reading

For the latest Release Notes, see the PaddlePaddle Release Notes Documentation website.

For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.

For more information about PaddlePaddle, including tutorials, documentation, and examples, see:

Security CVEs

To review known CVEs on this image, refer to the Security Scanning tab on this page.


By pulling and using the container, you accept the terms and conditions of this End User License Agreement.