NGC | Catalog
CatalogResourcesFastPitch Triton deployment for PyTorch

FastPitch Triton deployment for PyTorch

Logo for FastPitch Triton deployment for PyTorch
Description
Deploying high-performance inference for FastPitch using NVIDIA Triton Inference Server.
Publisher
NVIDIA
Latest Version
-
Modified
April 4, 2023
Compressed Size
0 B

This resource is a subproject of fastpitch_for_pytorch. Visit the parent project to download the code and get more information about the setup.

Introduction

The NVIDIA Triton Inference Server provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server.

This README provides step-by-step deployment instructions for models generated during training (as described in the model README). Additionally, this README provides the corresponding deployment scripts that ensure optimal GPU utilization during inferencing on Triton Inference Server.

Deployment process

The deployment process consists of two steps:

  1. Conversion. The purpose of conversion is to find the best performing model format supported by Triton Inference Server. Triton Inference Server uses a number of runtime backends such as TensorRT, LibTorch and ONNX Runtime to support various model types. Refer to Triton documentation for the list of available backends.
  2. Configuration. Model configuration on Triton Inference Server, which generates necessary configuration files.

To run benchmarks measuring the model performance in inference, perform the following steps:

  1. Start the Triton Inference Server.

    The Triton Inference Server container is started in one (possibly remote) container and ports for gRPC or REST API are exposed.

  2. Run accuracy tests.

    Produce results which are tested against given accuracy thresholds. Refer to step 8 in the Quick Start Guide.

  3. Run performance tests.

    Produce latency and throughput results for offline (static batching) and online (dynamic batching) scenarios. Refer to step 10 in the Quick Start Guide.