NGC | Catalog
Welcome Guest
CatalogResourcesBERT Triton deployment for PyTorch

BERT Triton deployment for PyTorch

For downloads and more information, please view on a desktop device.
Logo for BERT Triton deployment for PyTorch


Deploying high-performance inference for BERT model using NVIDIA Triton Inference Server.



Use Case




Latest Version



November 12, 2021

Compressed Size

0 B

This resource is a subproject of bert_for_pytorch. Visit the parent project to download the code and get more information about the setup.

The NVIDIA Triton Inference Server provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server. This folder contains detailed performance analysis as well as scripts to run SQuAD fine-tuning on BERT model using Triton Inference Server.