Linux / amd64
Linux / arm64
Triton Inference Server provides a cloud and edge inferencing solution optimized for both CPUs and GPUs. Triton supports an HTTP/REST and GRPC protocol that allows remote clients to request inferencing for any model being managed by the server. For edge deployments, Triton is available as a shared library with a C API that allows the full functionality of Triton to be included directly in an application. For more information, refer to v2.37.0 release of Triton Inference Server GitHub.
Before you can run an NGC deep learning framework container, your Docker environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers And Frameworks User Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC User Guide.
The method implemented in your system depends on the DGX OS version installed (for DGX systems), the specific NGC Cloud Image provided by a Cloud Service Provider, or the software that you have installed in preparation for running NGC containers on TITAN PCs, Quadro PCs, or vGPUs.
Procedure
Select the Tags tab and locate the container image release that you want to run.
In the Pull Tag column, click the icon to copy the docker pull command.
Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.
Run the container image by following the directions in the Triton Inference Server Quick Start Guide.
By pulling and using the container, you accept the terms and conditions of this End User License Agreement.
Please review the Security Scanning tab to view the latest security scan results.
For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning tab.
CVE‑2024‑53880: Contains a vulnerability in the model loading API, where a user could cause an integer overflow or wraparound error by loading a model with an extra-large file size that overflows an internal variable. To mitigate the impact of the vulnerability, We recommend only using explicit model loading in secure settings with additonal controls on clients. Please refer to the Secure Deployment Guide for more information about best practices for deploying Triton Inference Server. For more information about this vulnerability, please refer to the Security Bulletin.
CVE-2024-0100: Contains a vulnerability in the tracing API where a user can corrupt system files. To mitigate the impact of the vulnerability, you can use the Triton Inference Server behind an API gateway which doesn’t allow access to the tracing API. For more information about this vulnerability, please refer to the Security Bulletin.