NGC Catalog
CLASSIC
Welcome Guest
Containers
Triton Inference Server

Triton Inference Server

For copy image paths and more information, please view on a desktop device.
Logo for Triton Inference Server
Features
Description
Triton Inference Server is an open source software that lets teams deploy trained AI models from any framework, from local or cloud storage and on any GPU- or CPU-based infrastructure in the cloud, data center, or embedded devices.
Publisher
NVIDIA
Latest Tag
25.04-pyt-python-py3
Modified
May 9, 2025
Compressed Size
8.16 GB
Multinode Support
Yes
Multi-Arch Support
Yes
25.04-pyt-python-py3 (Latest) Security Scan Results

Linux / arm64

Sorry, your browser does not support inline SVG.

Linux / amd64

Sorry, your browser does not support inline SVG.

What Is The Triton Inference Server?

Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton is available as a shared library with a C API that allows the full functionality of Triton to be included directly in an application. The following Docker images are available:

  • The xx.yy-py3 image contains the Triton Inference Server with support for PyTorch, TensorRT, ONNX and OpenVINO models.

  • The xx.yy-py3-sdk image contains Python and C++ client libraries, client examples, GenAI-Perf, Performance Analyzer and the Model Analyzer.

  • The xx.yy-py3-min image is used as the base for creating custom Triton server containers as described in Customize Triton Container.

  • The xx.yy-pyt-python-py3 image contains the Triton Inference Server with support for PyTorch and Python backends only.

  • The xx.yy-py3-igpu image contains the Triton Inference Server with support for Jetson Orin devices. Please refer to the Frameworks Support Matrix for information regarding which iGPU hardware/software is supported by which container.

  • The xx.yy-py3-igpu-sdk image contains Python and C++ client libraries, client examples, and the Perf Analyzer.

  • The xx.yy-py3-igpu-min image is used as the base for creating custom iGPU Triton server containers.

  • The xx.yy-vllm-python-py3 image contains the Triton Inference Server with support for vLLM and Python backends only.

  • The xx.yy-trtllm-python-py3 image contains the Triton Inference Server with support for TensorRT-LLM and Python backends only.

For more information, refer to Triton Inference Server GitHub.

Need enterprise support? NVIDIA global support is available for Triton Inference Server with the NVIDIA AI Enterprise software suite. Check out NVIDIA LaunchPad for free access to a set of hands-on labs with Triton Inference Server hosted on NVIDIA infrastructure.

Running The Triton Inference Server

Before you can run an NGC deep learning framework container, your Docker environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers And Frameworks User Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC User Guide.

The method implemented in your system depends on the DGX OS version installed (for DGX systems), the specific NGC Cloud Image provided by a Cloud Service Provider, or the software that you have installed in preparation for running NGC containers on TITAN PCs, Quadro PCs, or vGPUs.

Procedure

  1. Select the Tags tab and locate the container image release that you want to run.

  2. In the Pull Tag column, click the icon to copy the docker pull command.

  3. Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.

  4. Run the container image by following the directions in the Triton Inference Server Quick Start Guide.

Suggested Reading

For the latest Release Notes, see the Triton Inference Server Release Notes.

For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.

Link to Open Source Code

For more information about the Triton Inference Server, see:

  • Triton Inference Server User Guide

License

By pulling and using the container, you accept the terms and conditions of this End User License Agreement and Product-Specific Terms.