NGC Catalog
CLASSIC
Welcome Guest
Resources
DLRM Triton deployment for PyTorch

DLRM Triton deployment for PyTorch

For downloads and more information, please view on a desktop device.
Logo for DLRM Triton deployment for PyTorch
Description
Deploying high-performance inference for DLRM model using NVIDIA Triton Inference Server.
Publisher
NVIDIA
Latest Version
-
Modified
April 4, 2023
Compressed Size
0 B

This resource is a subproject of dlrm_for_pytorch. Visit the parent project to download the code and get more information about the setup.

The NVIDIA Triton Inference Server provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server.