NGC | Catalog
CatalogContainersTensorRT Inference Server

TensorRT Inference Server

For copy image paths and more information, please view on a desktop device.
Logo for TensorRT Inference Server

Description

TensorRT Inference Server provides a data center inference solution optimized for NVIDIA GPUs. It maximizes inference utilization and performance on GPUs via an HTTP or gRPC endpoint, allowing remote clients to request inference for any model that is being managed by the server, as well as providing real-time metrics on latency and requests.

Publisher

NVIDIA

Latest Tag

20.02-py3-clientsdk

Modified

October 5, 2021

Compressed Size

2.99 GB

Multinode Support

No

Multi-Arch Support

No

20.02-py3-clientsdk (Latest) Scan Results

No results available.

NOTE: TensortRT Inference Server is now called Triton Inference Server.

Please see link

What Is The TensorRT Inference Server?

The TensorRT Inference Server provides a cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP endpoint, allowing remote clients to request inferencing for any model that is being managed by the server.

Two containers are included: one container provides the TensorRT Inference Server itself and the other container provides client libraries and examples that can be used with the inference server. For more information, refer to TensorRT Inference Server GitHub.

Running The TensorRT Inference Server

Before you can run an NGC deep learning framework container, your Docker environment must support NVIDIA GPUs. To run a container, issue the appropriate command as explained in the Running A Container chapter in the NVIDIA Containers And Frameworks User Guide and specify the registry, repository, and tags. For more information about using NGC, refer to the NGC Container User Guide.

The method implemented in your system depends on the DGX OS version installed (for DGX systems), the specific NGC Cloud Image provided by a Cloud Service Provider, or the software that you have installed in preparation for running NGC containers on TITAN PCs, Quadro PCs, or vGPUs.

Procedure

  1. Select the Tags tab and locate the container image release that you want to run.

  2. In the Pull Tag column, click the icon to copy the docker pull command.

  3. Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.

  4. Run the container image by following the directions in the TensorRT Inference Server Quick Start Guide.

Suggested Reading

For the latest Release Notes, see the TensorRT Inference Server Release Notes.

For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.

For more information about the TensorRT Inference Server, see: