NGC | Catalog
CatalogResourcesResNet v1.5 Triton deployment for PyTorch

ResNet v1.5 Triton deployment for PyTorch

Logo for ResNet v1.5 Triton deployment for PyTorch
Deploying high-performance inference for ResNet model using NVIDIA Triton Inference Server.
Latest Version
April 4, 2023
Compressed Size
0 B

This resource is a subproject of resnet_50_v1_5_for_pytorch. Visit the parent project to download the code and get more information about the setup.


The NVIDIA Triton Inference Server provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server.

This README provides step-by-step deployment instructions for models generated during training (as described in the model README). Additionally, this README provides the corresponding deployment scripts that ensure optimal GPU utilization during inferencing on Triton Inference Server.

Deployment process

The deployment process consists of two steps:

  1. Conversion. The purpose of conversion is to find the best performing model format supported by Triton Inference Server. Triton Inference Server uses a number of runtime backends such as TensorRT, LibTorch and ONNX Runtime to support various model types. Refer to the Triton documentation for a list of available backends.
  2. Configuration. Model configuration on Triton Inference Server, which generates necessary configuration files.

To run benchmarks measuring the model performance in inference, perform the following steps:

  1. Start the Triton Inference Server.

    The Triton Inference Server container is started in one (possibly remote) container and ports for gRPC or REST API are exposed.

  2. Run accuracy tests.

    Produce results which are tested against the given accuracy thresholds. Refer to step 9 in the Quick Start Guide.

  3. Run performance tests.

    Produce latency and throughput results for offline (static batching) and online (dynamic batching) scenarios. Refer to step 11 in the Quick Start Guide.