The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA's latest software release. For the most up-to-date performance measurements, go to [NVIDIA Data Center Deep Learning Product Performance](https://developer.nvidia.com/deep-learning-performance-training-inference). ### Offline scenario This table lists the common variable parameters for all performance measurements: | Parameter Name | Parameter Value | |:-----------------------------|:------------------| | Max Batch Size | 128.0 | | Number of model instances | 2.0 | | Triton Max Queue Delay | 1.0 | | Triton Preferred Batch Sizes | 64 128 | #### Offline: NVIDIA A40, TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA A40 * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel |![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_3l.svg)|![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_3r.svg)| |-----|-----|

Full tabular data

| Precision | Backend Accelerator | Client Batch Size | Inferences/second | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |:------------|:---------------------|--------------------:|--------------------:|--------------:|--------------:|--------------:|--------------:| | FP16 | TensorRT | 1 | 329.5 | 3.23 | 3.43 | 3.973 | 3.031 | | FP16 | TensorRT | 2 | 513.8 | 4.292 | 4.412 | 4.625 | 3.888 | | FP16 | TensorRT | 4 | 720.8 | 6.122 | 6.264 | 6.5 | 5.543 | | FP16 | TensorRT | 8 | 919.2 | 9.145 | 9.664 | 10.3 | 8.701 | | FP16 | TensorRT | 16 | 1000 | 17.522 | 17.979 | 19.098 | 16.01 | | FP16 | TensorRT | 32 | 889.6 | 37.49 | 38.481 | 40.316 | 35.946 | | FP16 | TensorRT | 64 | 992 | 66.837 | 67.923 | 70.324 | 64.645 | | FP16 | TensorRT | 128 | 896 | 148.461 | 149.854 | 150.05 | 143.684 |

#### Offline: NVIDIA DGX A100 (1x A100 80GB), TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA DGX A100 (1x A100 80GB) * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel |![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_7l.svg)|![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_7r.svg)| |-----|-----|

Full tabular data

| Precision | Backend Accelerator | Client Batch Size | Inferences/second | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |:------------|:---------------------|--------------------:|--------------------:|--------------:|--------------:|--------------:|--------------:| | FP16 | TensorRT | 1 | 387.9 | 2.626 | 2.784 | 2.875 | 2.574 | | FP16 | TensorRT | 2 | 637.2 | 3.454 | 3.506 | 3.547 | 3.135 | | FP16 | TensorRT | 4 | 982.4 | 4.328 | 4.454 | 4.627 | 4.07 | | FP16 | TensorRT | 8 | 1181.6 | 7.012 | 7.074 | 7.133 | 6.765 | | FP16 | TensorRT | 16 | 1446.4 | 11.162 | 11.431 | 11.941 | 11.061 | | FP16 | TensorRT | 32 | 1353.6 | 24.392 | 24.914 | 25.178 | 23.603 | | FP16 | TensorRT | 64 | 1478.4 | 45.539 | 46.096 | 47.546 | 43.401 | | FP16 | TensorRT | 128 | 1331.2 | 97.504 | 100.611 | 101.896 | 96.198 |

#### Offline: NVIDIA DGX-1 (1x V100 32GB), TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA DGX A100 (1x A100 80GB) * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel |![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_11l.svg)|![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_11r.svg)| |-----|-----|

Full tabular data

| Precision | Backend Accelerator | Client Batch Size | Inferences/second | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |:------------|:---------------------|--------------------:|--------------------:|--------------:|--------------:|--------------:|--------------:| | FP16 | TensorRT | 1 | 255.6 | 4.032 | 4.061 | 4.141 | 3.909 | | FP16 | TensorRT | 2 | 419.2 | 4.892 | 4.94 | 5.133 | 4.766 | | FP16 | TensorRT | 4 | 633.6 | 6.603 | 6.912 | 7.18 | 6.306 | | FP16 | TensorRT | 8 | 865.6 | 9.657 | 9.73 | 9.834 | 9.236 | | FP16 | TensorRT | 16 | 950.4 | 18.396 | 20.748 | 23.873 | 16.824 | | FP16 | TensorRT | 32 | 854.4 | 37.965 | 38.599 | 40.34 | 37.432 | | FP16 | TensorRT | 64 | 825.6 | 80.118 | 80.758 | 87.374 | 77.596 | | FP16 | TensorRT | 128 | 704 | 189.198 | 189.87 | 191.259 | 183.205 |

#### Offline: NVIDIA T4, TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA T4 * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel |![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_15l.svg)|![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_offline_15r.svg)| |-----|-----|

Full tabular data

| Precision | Backend Accelerator | Client Batch Size | Inferences/second | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |:------------|:---------------------|--------------------:|--------------------:|--------------:|--------------:|--------------:|--------------:| | FP16 | TensorRT | 1 | 211.7 | 4.89 | 4.926 | 4.965 | 4.717 | | FP16 | TensorRT | 2 | 327.8 | 6.258 | 6.309 | 6.436 | 6.094 | | FP16 | TensorRT | 4 | 468.4 | 8.996 | 9.085 | 9.239 | 8.531 | | FP16 | TensorRT | 8 | 544.8 | 15.654 | 15.978 | 16.324 | 14.673 | | FP16 | TensorRT | 16 | 544 | 30.626 | 30.788 | 31.311 | 29.477 | | FP16 | TensorRT | 32 | 524.8 | 64.527 | 65.35 | 66.13 | 60.943 | | FP16 | TensorRT | 64 | 556.8 | 115.455 | 115.717 | 116.02 | 113.802 | | FP16 | TensorRT | 128 | 537.6 | 242.501 | 244.599 | 246.16 | 238.384 |

### Online scenario This table lists the common variable parameters for all performance measurements: | Parameter Name | Parameter Value | |:-----------------------------|:------------------| | Max Batch Size | 128.0 | | Number of model instances | 2.0 | | Triton Max Queue Delay | 1.0 | | Triton Preferred Batch Sizes | 64 128 | #### Online: NVIDIA A40, TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA A40 * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel ![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_online_6.svg)

Full tabular data

| Concurrent client requests | Inferences/second | Client Send | Network+server Send/recv | Server Queue | Server Compute Input | Server Compute Infer | Server Compute Output | Client Recv | P50 Latency | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |-----------------------------:|--------------------:|--------------:|---------------------------:|---------------:|-----------------------:|-----------------------:|------------------------:|--------------:|--------------:|--------------:|--------------:|--------------:|--------------:| | 16 | 1421.3 | 0.109 | 4.875 | 1.126 | 0.895 | 4.188 | 0.053 | 0 | 11.046 | 17.34 | 17.851 | 19.013 | 11.246 | | 32 | 1920 | 0.118 | 8.402 | 1.47 | 1.323 | 5.277 | 0.09 | 0 | 16.328 | 28.052 | 29.871 | 31.932 | 16.68 | | 48 | 2270.4 | 0.12 | 11.505 | 1.856 | 1.582 | 5.953 | 0.113 | 0 | 22.172 | 31.87 | 35.395 | 41.256 | 21.129 | | 64 | 2401.9 | 0.12 | 14.443 | 2.299 | 2.358 | 7.285 | 0.149 | 0 | 26.69 | 37.388 | 40.73 | 47.503 | 26.654 | | 80 | 2823 | 0.126 | 14.917 | 2.71 | 2.406 | 7.977 | 0.174 | 0 | 29.113 | 39.932 | 43.789 | 51.24 | 28.31 | | 96 | 2903.8 | 0.133 | 18.824 | 2.929 | 2.595 | 8.364 | 0.18 | 0 | 33.951 | 46.785 | 51.878 | 60.37 | 33.025 | | 112 | 3096.6 | 0.135 | 20.018 | 3.362 | 2.97 | 9.434 | 0.209 | 0 | 37.927 | 50.587 | 55.169 | 63.141 | 36.128 | | 128 | 3252 | 0.138 | 21.092 | 3.912 | 3.445 | 10.505 | 0.245 | 0 | 41.241 | 53.912 | 58.961 | 68.864 | 39.337 | | 144 | 3352.4 | 0.137 | 21.407 | 4.527 | 4.237 | 12.363 | 0.293 | 0 | 44.211 | 59.876 | 65.971 | 79.335 | 42.964 | | 160 | 3387.4 | 0.137 | 22.947 | 5.179 | 4.847 | 13.805 | 0.326 | 0 | 48.423 | 65.393 | 69.568 | 81.288 | 47.241 | | 176 | 3409.1 | 0.142 | 24.989 | 5.623 | 5.539 | 14.956 | 0.357 | 0 | 52.714 | 71.332 | 78.478 | 99.086 | 51.606 | | 192 | 3481.8 | 0.143 | 25.661 | 6.079 | 6.666 | 16.442 | 0.372 | 0 | 55.383 | 79.276 | 95.479 | 122.295 | 55.363 | | 208 | 3523.8 | 0.147 | 27.042 | 6.376 | 7.526 | 17.413 | 0.4 | 0 | 58.823 | 86.375 | 104.134 | 123.278 | 58.904 | | 224 | 3587.2 | 0.148 | 29.648 | 6.776 | 7.659 | 17.85 | 0.411 | 0 | 61.973 | 91.804 | 107.987 | 130.413 | 62.492 | | 240 | 3507.4 | 0.153 | 31.079 | 7.987 | 9.246 | 19.342 | 0.426 | 0 | 65.697 | 106.035 | 121.914 | 137.572 | 68.233 | | 256 | 3504.4 | 0.16 | 34.664 | 8.252 | 9.886 | 19.567 | 0.461 | 0 | 70.708 | 115.965 | 127.808 | 147.327 | 72.99 |

#### Online: NVIDIA DGX A100 (1x A100 80GB), TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA DGX A100 (1x A100 80GB) * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel ![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_online_14.svg)

Full tabular data

| Concurrent client requests | Inferences/second | Client Send | Network+server Send/recv | Server Queue | Server Compute Input | Server Compute Infer | Server Compute Output | Client Recv | P50 Latency | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |-----------------------------:|--------------------:|--------------:|---------------------------:|---------------:|-----------------------:|-----------------------:|------------------------:|--------------:|--------------:|--------------:|--------------:|--------------:|--------------:| | 16 | 1736.5 | 0.11 | 2.754 | 1.272 | 0.954 | 4.08 | 0.036 | 0 | 9.037 | 12.856 | 13.371 | 15.174 | 9.206 | | 32 | 2418.9 | 0.114 | 5.15 | 1.494 | 1.361 | 5.031 | 0.072 | 0 | 13.234 | 20.638 | 21.717 | 23.352 | 13.222 | | 48 | 2891.3 | 0.112 | 7.389 | 1.721 | 1.586 | 5.688 | 0.096 | 0 | 17.089 | 25.946 | 27.611 | 29.784 | 16.592 | | 64 | 3432.6 | 0.11 | 7.866 | 2.11 | 2.126 | 6.301 | 0.131 | 0 | 19.322 | 25.971 | 28.845 | 34.024 | 18.644 | | 80 | 3644.6 | 0.116 | 9.665 | 2.33 | 2.493 | 7.185 | 0.146 | 0 | 22.834 | 29.061 | 32.281 | 37.224 | 21.935 | | 96 | 3902.2 | 0.116 | 11.138 | 2.676 | 2.828 | 7.684 | 0.166 | 0 | 25.589 | 32.572 | 35.307 | 40.123 | 24.608 | | 112 | 3960.6 | 0.124 | 13.321 | 2.964 | 3.209 | 8.438 | 0.186 | 0 | 29.537 | 37.388 | 40.602 | 46.193 | 28.242 | | 128 | 4137.7 | 0.124 | 14.325 | 3.372 | 3.646 | 9.244 | 0.219 | 0 | 31.587 | 41.968 | 44.993 | 51.38 | 30.93 | | 144 | 4139.6 | 0.136 | 15.919 | 3.803 | 4.451 | 10.274 | 0.233 | 0 | 35.696 | 48.301 | 51.345 | 57.414 | 34.816 | | 160 | 4300.5 | 0.134 | 16.453 | 4.341 | 4.934 | 10.979 | 0.274 | 0 | 38.495 | 50.566 | 53.943 | 61.406 | 37.115 | | 176 | 4166.6 | 0.143 | 18.436 | 4.959 | 6.081 | 12.321 | 0.309 | 0 | 43.451 | 60.739 | 69.51 | 84.959 | 42.249 | | 192 | 4281.3 | 0.138 | 19.585 | 5.201 | 6.571 | 13.042 | 0.313 | 0 | 46.175 | 62.718 | 69.46 | 83.032 | 44.85 | | 208 | 4314.8 | 0.15 | 20.046 | 5.805 | 7.752 | 14.062 | 0.335 | 0 | 47.957 | 73.848 | 84.644 | 96.408 | 48.15 | | 224 | 4388.2 | 0.141 | 21.393 | 6.105 | 8.236 | 14.85 | 0.343 | 0 | 50.449 | 77.534 | 88.553 | 100.727 | 51.068 | | 240 | 4371.8 | 0.143 | 22.342 | 6.711 | 9.423 | 15.78 | 0.377 | 0 | 53.216 | 85.983 | 97.756 | 112.48 | 54.776 | | 256 | 4617.3 | 0.144 | 23.392 | 6.595 | 9.466 | 15.568 | 0.367 | 0 | 54.703 | 86.054 | 93.95 | 105.917 | 55.532 |

#### Online: NVIDIA DGX-1 (1x V100 32GB), TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA DGX-1 (1x V100 32GB) * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel ![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_online_22.svg)

Full tabular data

| Concurrent client requests | Inferences/second | Client Send | Network+server Send/recv | Server Queue | Server Compute Input | Server Compute Infer | Server Compute Output | Client Recv | P50 Latency | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |-----------------------------:|--------------------:|--------------:|---------------------------:|---------------:|-----------------------:|-----------------------:|------------------------:|--------------:|--------------:|--------------:|--------------:|--------------:|--------------:| | 16 | 1259.7 | 0.121 | 3.735 | 1.999 | 0.803 | 5.998 | 0.034 | 0 | 13.623 | 17.271 | 17.506 | 18.938 | 12.69 | | 32 | 1686.4 | 0.17 | 6.9 | 2.33 | 2.212 | 7.303 | 0.07 | 0 | 18.836 | 28.302 | 30.423 | 32.916 | 18.985 | | 48 | 1888.3 | 0.183 | 9.068 | 3.372 | 3.65 | 9.058 | 0.108 | 0.001 | 26.571 | 36.583 | 40.84 | 50.402 | 25.44 | | 64 | 2103.9 | 0.204 | 12.416 | 3.146 | 4.304 | 10.127 | 0.145 | 0.001 | 32.401 | 37.121 | 41.252 | 49.094 | 30.343 | | 80 | 2255.2 | 0.211 | 13.753 | 4.074 | 5.455 | 11.776 | 0.192 | 0.001 | 38.298 | 47.082 | 54.476 | 65.412 | 35.462 | | 96 | 2376.6 | 0.214 | 16.22 | 4.873 | 5.972 | 12.911 | 0.208 | 0.001 | 43.008 | 52.947 | 57.126 | 69.778 | 40.399 | | 112 | 2445.6 | 0.243 | 18.495 | 5.461 | 7.012 | 14.365 | 0.248 | 0.001 | 48.081 | 62.414 | 68.274 | 85.766 | 45.825 | | 128 | 2534.2 | 0.261 | 19.294 | 6.486 | 7.925 | 16.312 | 0.282 | 0.001 | 52.894 | 68.475 | 74.852 | 89.979 | 50.561 | | 144 | 2483.9 | 0.27 | 20.771 | 7.744 | 9.993 | 18.865 | 0.414 | 0.001 | 64.866 | 70.434 | 80.279 | 99.177 | 58.058 | | 160 | 2512.8 | 0.302 | 24.205 | 7.838 | 11.217 | 19.689 | 0.373 | 0.001 | 69.085 | 85.576 | 95.016 | 109.455 | 63.625 | | 176 | 2541 | 0.311 | 26.206 | 8.556 | 12.439 | 21.393 | 0.418 | 0.001 | 76.666 | 92.266 | 106.889 | 127.055 | 69.324 | | 192 | 2623.4 | 0.33 | 27.783 | 9.058 | 13.198 | 22.181 | 0.433 | 0.001 | 79.724 | 97.736 | 111.44 | 142.418 | 72.984 | | 208 | 2616.2 | 0.353 | 29.667 | 9.759 | 15.693 | 23.567 | 0.444 | 0.001 | 80.571 | 125.202 | 140.527 | 175.331 | 79.484 | | 224 | 2693.9 | 0.369 | 32.283 | 9.941 | 15.769 | 24.304 | 0.439 | 0.001 | 78.743 | 137.09 | 151.955 | 183.397 | 83.106 | | 240 | 2700.4 | 0.447 | 32.287 | 11.128 | 18.204 | 26.578 | 0.456 | 0.001 | 82.561 | 155.011 | 177.925 | 191.51 | 89.101 | | 256 | 2743.8 | 0.481 | 34.688 | 11.834 | 19.087 | 26.597 | 0.459 | 0.001 | 89.387 | 153.866 | 177.805 | 204.319 | 93.147 |

#### Online: NVIDIA T4, TF-TRT with FP16 Our results were obtained using the following configuration: * **GPU:** NVIDIA T4 * **Backend:** TensorFlow * **Model binding:** TF-TRT * **Precision:** FP16 * **Model format:** TensorFlow SavedModel ![](https://github.com/NVIDIA/DeepLearningExamples/raw/master/TensorFlow/Classification/ConvNets/triton/plots/graph_performance_online_30.svg)

Full tabular data

| Concurrent client requests | Inferences/second | Client Send | Network+server Send/recv | Server Queue | Server Compute Input | Server Compute Infer | Server Compute Output | Client Recv | P50 Latency | P90 Latency | P95 Latency | P99 Latency | Avg Latency | |-----------------------------:|--------------------:|--------------:|---------------------------:|---------------:|-----------------------:|-----------------------:|------------------------:|--------------:|--------------:|--------------:|--------------:|--------------:|--------------:| | 16 | 731.4 | 0.271 | 6.9 | 3.745 | 2.073 | 8.802 | 0.081 | 0.001 | 25.064 | 28.863 | 29.7 | 32.01 | 21.873 | | 32 | 935 | 0.273 | 12.023 | 3.48 | 4.375 | 13.885 | 0.141 | 0.001 | 31.339 | 50.564 | 52.684 | 55.823 | 34.178 | | 48 | 1253 | 0.298 | 12.331 | 5.313 | 4.623 | 15.634 | 0.178 | 0.001 | 38.099 | 60.665 | 64.537 | 72.38 | 38.378 | | 64 | 1368.3 | 0.303 | 15.3 | 6.926 | 4.9 | 19.118 | 0.2 | 0.001 | 48.758 | 66.391 | 73.271 | 81.537 | 46.748 | | 80 | 1410.7 | 0.296 | 15.525 | 11.06 | 6.934 | 22.476 | 0.286 | 0.001 | 60.346 | 65.664 | 76.055 | 84.643 | 56.578 | | 96 | 1473.1 | 0.309 | 18.846 | 11.746 | 7.825 | 26.165 | 0.319 | 0.001 | 69.785 | 77.337 | 91.586 | 100.918 | 65.211 | | 112 | 1475.5 | 0.316 | 23.275 | 12.412 | 8.954 | 30.724 | 0.338 | 0.001 | 79.904 | 106.324 | 111.382 | 126.559 | 76.02 | | 128 | 1535.9 | 0.328 | 23.486 | 14.64 | 10.057 | 34.534 | 0.352 | 0.001 | 89.451 | 110.789 | 121.814 | 140.139 | 83.398 | | 144 | 1512.3 | 0.336 | 25.79 | 18.7 | 12.205 | 37.909 | 0.435 | 0.001 | 103.388 | 108.917 | 114.44 | 136.469 | 95.376 | | 160 | 1533.6 | 0.406 | 29.825 | 17.67 | 13.751 | 42.259 | 0.44 | 0.001 | 111.899 | 140.67 | 154.76 | 191.391 | 104.352 | | 176 | 1515.1 | 0.438 | 34.286 | 17.867 | 16.42 | 46.792 | 0.461 | 0.001 | 120.503 | 187.317 | 205.71 | 223.391 | 116.265 | | 192 | 1532.2 | 0.476 | 34.796 | 18.86 | 19.071 | 51.446 | 0.483 | 0.001 | 124.044 | 211.466 | 226.921 | 237.664 | 125.133 | | 208 | 1616.7 | 0.697 | 32.363 | 21.465 | 18.315 | 55.539 | 0.516 | 0.001 | 127.891 | 200.478 | 221.404 | 250.348 | 128.896 | | 224 | 1541.5 | 0.702 | 35.932 | 22.786 | 22.138 | 62.657 | 0.527 | 0.001 | 141.32 | 248.069 | 263.661 | 276.579 | 144.743 | | 240 | 1631.7 | 0.79 | 37.581 | 22.791 | 21.651 | 64.278 | 0.549 | 0.001 | 141.393 | 250.354 | 272.17 | 289.926 | 147.641 | | 256 | 1607.4 | 0.801 | 39.342 | 29.09 | 23.416 | 66.866 | 0.593 | 0.001 | 157.87 | 262.818 | 280.921 | 310.504 | 160.109 |