NGC | Catalog
Welcome Guest
CatalogResourcesEfficientNet For Tensorflow2

EfficientNet For Tensorflow2

For downloads and more information, please view on a desktop device.
Logo for EfficientNet For Tensorflow2

Description

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, being an order-of-magnitude smaller and faster.

Publisher

NVIDIA

Use Case

Classification

Framework

TensorFlow2

Latest Version

21.02.4

Modified

March 2, 2022

Compressed Size

267.21 KB

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA's latest software release. For the most up-to-date performance measurements, go to NVIDIA Data Center Deep Learning Product Performance.

Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.

Training performance benchmark

Training benchmark for EfficientNet-B0 was run on NVIDIA DGX A100 80GB and NVIDIA DGX-1 V100 16GB.

To benchmark training performance with other parameters, run:

bash ./scripts/B0/training/{AMP, FP32, TF32}/train_benchmark_8x{A100-80G, V100-16G}.sh

Training benchmark for EfficientNet-B4 was run on NVIDIA DGX A100- 80GB and NVIDIA DGX-1 V100 32GB.

bash ./scripts/B4/training/{AMP, FP32, TF32}/train_benchmark_8x{A100-80G, V100-16G}.sh

Inference performance benchmark

Inference benchmark for EfficientNet-B0 was run on NVIDIA DGX A100- 80GB and NVIDIA DGX-1 V100 16GB.

Inference benchmark for EfficientNet-B4 was run on NVIDIA DGX A100- 80GB and NVIDIA DGX-1 V100 32GB.

Results

The following sections provide details on how we achieved our performance and accuracy in training and inference.

Training accuracy results for EfficientNet-B0

Training accuracy: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the training scripts in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs.

GPUs Accuracy - TF32 Accuracy - mixed precision Time to train - TF32 Time to train - mixed precision Time to train speedup (TF32 to mixed precision)
8 77.38 77.43 19 10.5 1.8
16 77.46 77.62 10 5.5 1.81
Training accuracy: NVIDIA DGX-1 (8x V100 16GB)

Our results were obtained by running the training scripts in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs.

GPUs Accuracy - FP32 Accuracy - mixed precision Time to train - FP32 Time to train - mixed precision Time to train speedup (FP32 to mixed precision)
8 77.54 77.51 48 44 1.09
32 77.38 77.62 11.48 11.44 1.003

Training accuracy results for EfficientNet-B4

Training accuracy: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the training scripts in the tensorflow:21.02-tf2-py3 NGC container on multi-node NVIDIA DGX A100 (8x A100 80GB) GPUs.

GPUs Accuracy - TF32 Accuracy - mixed precision Time to train - TF32 Time to train - mixed precision Time to train speedup (TF32 to mixed precision)
32 82.69 82.69 38 17.5 2.17
64 82.75 82.78 18 8.5 2.11
Training accuracy: NVIDIA DGX-1 (8x V100 32GB)

Our results were obtained by running the training scripts in the tensorflow:21.02-tf2-py3 NGC container on multi-node NVIDIA DGX-1 (8x V100 32GB) GPUs.

GPUs Accuracy - FP32 Accuracy - mixed precision Time to train - FP32 Time to train - mixed precision Time to train speedup (FP32 to mixed precision)
32 82.78 82.78 95 39.5 2.40
64 82.74 82.74 53 19 2.78

Training performance results for EfficientNet-B0

Training performance: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the training benchmark script in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs. Performance numbers (in items/images per second) were averaged over 5 entire training epoch.

GPUs Throughput - TF32 Throughput - mixed precision Throughput speedup (TF32 - mixed precision) Weak scaling - TF32 Weak scaling - mixed precision
1 1206 2549 2.11 1 1
8 9365 16336 1.74 7.76 6.41
16 18361 33000 1.79 15.223 12.95

To achieve these same results, follow the steps in the Quick Start Guide.

Training performance: NVIDIA DGX-1 (8x V100 16GB)

Our results were obtained by running the training benchmark script in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.

GPUs Throughput - FP32 Throughput - mixed precision Throughput speedup (FP32 - mixed precision) Weak scaling - FP32 Weak scaling - mixed precision
1 629 712 1.13 1 1
8 4012 4065 1.01 6.38 5.71

To achieve these same results, follow the steps in the Quick Start Guide.

Training performance results for EfficientNet-B4

Training performance: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the training benchmark script in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs. Performance numbers (in items/images per second) were averaged over 5 entire training epoch.

GPUs Throughput - TF32 Throughput - mixed precision Throughput speedup (TF32 - mixed precision) Weak scaling - TF32 Weak scaling - mixed precision
1 167 394 2.34 1 1
8 1280 2984 2.33 7.66 7.57
32 5023 11034 2.19 30.07 28.01
64 9838 21844 2.22 58.91 55.44

To achieve these same results, follow the steps in the Quick Start Guide.

Training performance: NVIDIA DGX-1 (8x V100 32GB)

Our results were obtained by running the training benchmark script in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs. Performance numbers (in items/images per second) were averaged over an entire training epoch.

GPUs Throughput - FP32 Throughput - mixed precision Throughput speedup (FP32 - mixed precision) Weak scaling - FP32 Weak scaling - mixed precision
1 89 193 2.16 1 1
8 643 1298 2.00 7.28 6.73
32 2095 4892 2.33 23.54 25.35
64 4109 9666 2.35 46.17 50.08

To achieve these same results, follow the steps in the Quick Start Guide.

Inference performance results for EfficientNet-B0

Inference performance: NVIDIA DGX A100 (1x A100 80GB)

Our results were obtained by running the inferencing benchmarking script in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX A100 (1x A100 80GB) GPU.

FP16 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 224x224 111 8.97 8.88 8.92 8.96
2 224x224 233 8.56 8.44 8.5 8.54
4 224x224 432 9.24 9.12 9.16 9.2
8 224x224 771 10.32 10.16 10.24 10.24
1024 224x224 10269 102.4 102.4 102.4 102.4

TF32 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 224x224 101 9.87 9.78 9.82 9.86
2 224x224 204 9.78 9.66 9.7 9.76
4 224x224 381 10.48 10.36 10.4 10.44
8 224x224 584 13.68 13.52 13.6 13.68
512 224x224 5480 92.16 92.16 92.16 92.16

To achieve these same results, follow the steps in the Quick Start Guide.

Inference performance: NVIDIA DGX-1 (1x V100 16GB)

Our results were obtained by running the inference-script-name.sh inferencing benchmarking script in the TensorFlow NGC container on NVIDIA DGX-1 (1x V100 16GB) GPU.

FP16 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 224x224 98.8 10.12 10.03 10.06 10.10
2 224x224 199.3 10.02 9.9 9.94 10.0
4 224x224 382.5 10.44 10.28 10.36 10.4
8 224x224 681.2 11.68 11.52 11.6 11.68
256 224x224 5271 48.64 46.08 46.08 48.64
FP32 Inference Latency
Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 224x224 68.39 14.62 14.45 14.51 14.56
2 224x224 125.62 15.92 15.78 15.82 15.82
4 224x224 216.41 18.48 18.24 18.4 18.44
8 224x224 401.60 19.92 19.6 19.76 19.84
128 224x224 2713 47.36 46.08 46.08 47.36

To achieve these same results, follow the steps in the Quick Start Guide.

Inference performance results for EfficientNet-B4

Inference performance: NVIDIA DGX A100 (1x A100 80GB)

Our results were obtained by running the inferencing benchmarking script in the tensorflow:21.02-tf2-py3 NGC container on NVIDIA DGX A100 (1x A100 80GB) GPU.

FP16 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 380x380 57.54 17.37 17.24 17.30 17.35
2 380x380 112.06 17.84 17.7 17.76 17.82
4 380x380 219.71 18.2 18.08 18.12 18.16
8 380x380 383.39 20.8 20.64 20.72 20.8
128 380x380 1470 87.04 85.76 85.76 87.04

TF32 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 380x380 52.68 18.98 18.86 18.91 18.96
2 380x380 95.32 20.98 20.84 20.9 20.96
4 380x380 182.14 21.96 21.84 21.88 21.92
8 380x380 325.72 24.56 24.4 24.4 24.48
64 380x380 694 91.52 90.88 91.52 91.52

To achieve these same results, follow the steps in the Quick Start Guide.

Inference performance: NVIDIA DGX-1 (1x V100 32GB)

Our results were obtained by running the inference-script-name.sh inferencing benchmarking script in the TensorFlow NGC container on NVIDIA DGX-1 (1x V100 16GB) GPU.

FP16 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 380x380 54.27 18.35 18.20 18.25 18.32
2 380x380 104.27 19.18 19.02 19.08 19.16
4 380x380 182.61 21.88 21.64 21.72 21.84
8 380x380 234.06 34.16 33.92 34.0 34.08
64 380x380 782.47 81.92 80.0 80.64 81.28

FP32 Inference Latency

Batch size Resolution Throughput Avg Latency Avg (ms) Latency 90% (ms) Latency 95% (ms) Latency 99% (ms)
1 380x380 30.48 32.80 32.86 31.83 32.60
2 380x380 58.59 34.12 31.92 33.02 33.9
4 380x380 111.35 35.92 35.0 35.12 35.68
8 380x380 199.00 40.24 38.72 39.04 40.0
32 380x380 307.04 104.0 104.0 104.0 104.0

To achieve these same results, follow the steps in the Quick Start Guide.