_**This resource is a subproject of [dlrm_for_pytorch](https://ngc.nvidia.com/catalog/resources/nvidia:dlrm_for_pytorch). Visit the parent project to download the code and get more information about the setup.**_


The [NVIDIA Triton Inference Server](https://github.com/NVIDIA/triton-inference-server) provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server.

dlrm_for_triton_from_pytorch

Deploying high-performance inference for DLRM model using NVIDIA Triton Inference Server.

DLRM Triton deployment for PyTorch