NVIDIA Deep Learning Examples
NVIDIA Deep Learning Examples
EfficientNet for PyTorch
Resource
NVIDIA Deep Learning Examples
NVIDIA Deep Learning Examples
EfficientNet for PyTorch

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, being an order-of-magnitude smaller and faster.

The performance measurements in this document were conducted at the time of publication and may not reflect the performance achieved from NVIDIA's latest software release. For the most up-to-date performance measurements, go to NVIDIA Data Center Deep Learning Product Performance.

Benchmarking

The following section shows how to run benchmarks measuring the model performance in training and inference modes.

Training performance benchmark

To benchmark training, run:

  • For 1 GPU

    • FP32 (V100 GPUs only)
      python ./launch.py --model efficientnet-<version> --precision FP32 --mode benchmark_training --platform DGX1V <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100
    • TF32 (A100 GPUs only)
      python ./launch.py --model efficientnet-<version> --precision TF32 --mode benchmark_training --platform DGXA100 <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100
    • AMP
      python ./launch.py --model efficientnet-<version> --precision AMP --mode benchmark_training --platform <DGX1V|DGXA100> <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100
  • For multiple GPUs

    • FP32 (V100 GPUs only)
      python ./launch.py --model efficientnet-<version> --precision FP32 --mode benchmark_training --platform DGX1V <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100
    • TF32 (A100 GPUs only)
      python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-<version> --precision TF32 --mode benchmark_training --platform DGXA100 <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100
    • AMP
      python ./multiproc.py --nproc_per_node 8 ./launch.py --model efficientnet-<version> --precision AMP --mode benchmark_training --platform <DGX1V|DGXA100> <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100

Each of these scripts will run 100 iterations and save results in the benchmark.json file.

Inference performance benchmark

To benchmark inference, run:

  • FP32 (V100 GPUs only)

python ./launch.py --model efficientnet-<version> --precision FP32 --mode benchmark_inference --platform DGX1V <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100

  • TF32 (A100 GPUs only)

python ./launch.py --model efficientnet-<version> --precision TF32 --mode benchmark_inference --platform DGXA100 <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100

  • AMP

python ./launch.py --model efficientnet-<version> --precision AMP --mode benchmark_inference --platform <DGX1V|DGXA100> <path to imagenet> --raport-file benchmark.json --epochs 1 --prof 100

Each of these scripts will run 100 iterations and save results in the benchmark.json file.

Results

Our results were obtained by running the applicable training script in the pytorch-21.03 NGC container.

To achieve these same results, follow the steps in the Quick Start Guide.

Training accuracy results

Training accuracy: NVIDIA DGX A100 (8x A100 80GB)

Our results were obtained by running the applicable efficientnet/training/<AMP|TF32>/*.sh training script in the PyTorch 20.12 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs.

ModelEpochsGPUsTop1 accuracy - TF32Top1 accuracy - mixed precisionTime to train - TF32Time to train - mixed precisionTime to train speedup (TF32 to mixed precision)
efficientnet-b0400877.16 +/- 0.0777.42 +/- 0.1119111.727
efficientnet-b4400882.82 +/- 0.0482.85 +/- 0.09126661.909
efficientnet-widese-b0400877.84 +/- 0.0877.84 +/- 0.0219101.900
efficientnet-widese-b4400883.13 +/- 0.1183.1 +/- 0.09126661.909
Training accuracy: NVIDIA DGX-1 (8x V100 16GB)

Our results were obtained by running the applicable efficientnet/training/<AMP|FP32>/*.sh training script in the PyTorch 20.12 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs.

ModelEpochsGPUsTop1 accuracy - FP32Top1 accuracy - mixed precisionTime to train - FP32Time to train - mixed precisionTime to train speedup (FP32 to mixed precision)
efficientnet-b0400877.02 +/- 0.0477.17 +/- 0.0834241.417
efficientnet-widese-b0400877.59 +/- 0.1677.69 +/- 0.1235241.458
Example plots

The following images show an A100 run.

ValidationLoss

ValidationTop1

ValidationTop5

Training performance results

Training performance: NVIDIA A100 (8x A100 80GB)

Our results were obtained by running the applicable efficientnet/training/<AMP|TF32>/*.sh training script in the PyTorch 21.03 NGC container on NVIDIA DGX A100 (8x A100 80GB) GPUs.

ModelGPUsTF32Throughput - mixed precisionThroughput speedup (TF32 to mixed precision)TF32 Strong ScalingMixed Precision Strong Scaling
efficientnet-b011078 img/s2489 img/s2.3 x1.0 x1.0 x
efficientnet-b088193 img/s16652 img/s2.03 x7.59 x6.68 x
efficientnet-b01616137 img/s29332 img/s1.81 x14.96 x11.78 x
efficientnet-b41157 img/s331 img/s2.1 x1.0 x1.0 x
efficientnet-b481223 img/s2570 img/s2.1 x7.76 x7.75 x
efficientnet-b4162417 img/s4813 img/s1.99 x15.34 x14.51 x
efficientnet-b4324813 img/s9425 img/s1.95 x30.55 x28.42 x
efficientnet-b4649146 img/s18900 img/s2.06 x58.05 x57.0 x
efficientnet-widese-b011078 img/s2512 img/s2.32 x1.0 x1.0 x
efficientnet-widese-b088244 img/s16368 img/s1.98 x7.64 x6.51 x
efficientnet-widese-b01616062 img/s29798 img/s1.85 x14.89 x11.86 x
efficientnet-widese-b41157 img/s331 img/s2.1 x1.0 x1.0 x
efficientnet-widese-b481223 img/s2585 img/s2.11 x7.77 x7.8 x
efficientnet-widese-b4162399 img/s5041 img/s2.1 x15.24 x15.21 x
efficientnet-widese-b4324616 img/s9379 img/s2.03 x29.32 x28.3 x
efficientnet-widese-b4649140 img/s18516 img/s2.02 x58.07 x55.88 x
Training performance: NVIDIA DGX-1 (8x V100 16GB)

Our results were obtained by running the applicable efficientnet/training/<AMP|FP32>/*.sh training script in the PyTorch 21.03 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs.

ModelGPUsFP32Throughput - mixed precisionThroughput speedup (FP32 to mixed precision)FP32 Strong ScalingMixed Precision Strong Scaling
efficientnet-b01655 img/s1301 img/s1.98 x1.0 x1.0 x
efficientnet-b084672 img/s7789 img/s1.66 x7.12 x5.98 x
efficientnet-b4183 img/s204 img/s2.46 x1.0 x1.0 x
efficientnet-b48616 img/s1366 img/s2.21 x7.41 x6.67 x
efficientnet-widese-b01655 img/s1299 img/s1.98 x1.0 x1.0 x
efficientnet-widese-b084592 img/s7875 img/s1.71 x7.0 x6.05 x
efficientnet-widese-b4183 img/s204 img/s2.45 x1.0 x1.0 x
efficientnet-widese-b48612 img/s1356 img/s2.21 x7.34 x6.63 x
Training performance: NVIDIA DGX-1 (8x V100 32GB)

Our results were obtained by running the applicable efficientnet/training/<AMP|FP32>/*.sh training script in the PyTorch 21.03 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs.

ModelGPUsFP32Throughput - mixed precisionThroughput speedup (FP32 to mixed precision)FP32 Strong ScalingMixed Precision Strong Scaling
efficientnet-b01646 img/s1401 img/s2.16 x1.0 x1.0 x
efficientnet-b084937 img/s8615 img/s1.74 x7.63 x6.14 x
efficientnet-b4136 img/s89 img/s2.44 x1.0 x1.0 x
efficientnet-b48641 img/s1565 img/s2.44 x17.6 x17.57 x
efficientnet-widese-b01281 img/s603 img/s2.14 x1.0 x1.0 x
efficientnet-widese-b084924 img/s8870 img/s1.8 x17.49 x14.7 x
efficientnet-widese-b4136 img/s89 img/s2.45 x1.0 x1.0 x
efficientnet-widese-b48639 img/s1556 img/s2.43 x17.61 x17.44 x

Inference performance results

Inference performance: NVIDIA A100 (1x A100 80GB)

Our results were obtained by running the applicable efficientnet/inference/<AMP|FP32>/*.sh inference script in the PyTorch 21.03 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs.

TF32 Inference Latency
ModelBatch SizeThroughput AvgLatency AvgLatency 95%Latency 99%
efficientnet-b01130 img/s9.33 ms7.95 ms9.0 ms
efficientnet-b02262 img/s9.39 ms8.51 ms9.5 ms
efficientnet-b04503 img/s9.68 ms9.53 ms10.78 ms
efficientnet-b081004 img/s9.85 ms9.89 ms11.49 ms
efficientnet-b0161880 img/s10.27 ms10.34 ms11.19 ms
efficientnet-b0323401 img/s11.46 ms12.51 ms14.39 ms
efficientnet-b0644656 img/s19.58 ms14.52 ms16.63 ms
efficientnet-b01285001 img/s31.03 ms25.72 ms28.34 ms
efficientnet-b02565154 img/s60.71 ms49.44 ms54.99 ms
efficientnet-b4169 img/s16.22 ms14.87 ms15.34 ms
efficientnet-b42133 img/s16.84 ms16.49 ms17.72 ms
efficientnet-b44259 img/s17.33 ms16.39 ms19.67 ms
efficientnet-b48491 img/s18.22 ms18.09 ms19.51 ms
efficientnet-b416606 img/s28.28 ms26.55 ms26.84 ms
efficientnet-b432651 img/s51.08 ms49.39 ms49.61 ms
efficientnet-b464684 img/s96.23 ms93.54 ms93.78 ms
efficientnet-b4128700 img/s195.22 ms182.17 ms182.42 ms
efficientnet-b4256702 img/s380.01 ms361.81 ms371.64 ms
efficientnet-widese-b01130 img/s9.49 ms8.76 ms9.68 ms
efficientnet-widese-b02265 img/s9.25 ms8.51 ms9.75 ms
efficientnet-widese-b04520 img/s9.42 ms8.67 ms9.97 ms
efficientnet-widese-b08996 img/s12.27 ms9.69 ms11.31 ms
efficientnet-widese-b0161916 img/s10.2 ms10.29 ms11.3 ms
efficientnet-widese-b0323293 img/s11.71 ms13.0 ms14.57 ms
efficientnet-widese-b0644639 img/s16.21 ms14.61 ms16.29 ms
efficientnet-widese-b01284997 img/s30.81 ms25.76 ms26.02 ms
efficientnet-widese-b02565166 img/s73.68 ms49.39 ms55.74 ms
efficientnet-widese-b4168 img/s16.41 ms15.14 ms16.59 ms
efficientnet-widese-b42135 img/s16.65 ms15.52 ms17.93 ms
efficientnet-widese-b44251 img/s17.74 ms17.29 ms20.47 ms
efficientnet-widese-b48501 img/s17.75 ms17.12 ms18.01 ms
efficientnet-widese-b416590 img/s28.94 ms27.29 ms27.81 ms
efficientnet-widese-b432651 img/s50.96 ms49.34 ms49.55 ms
efficientnet-widese-b464683 img/s99.28 ms93.65 ms93.88 ms
efficientnet-widese-b4128700 img/s189.81 ms182.3 ms182.58 ms
efficientnet-widese-b4256702 img/s379.36 ms361.84 ms366.05 ms
Mixed Precision Inference Latency
ModelBatch SizeThroughput AvgLatency AvgLatency 95%Latency 99%
efficientnet-b01105 img/s11.21 ms9.9 ms12.55 ms
efficientnet-b02214 img/s11.01 ms10.06 ms11.89 ms
efficientnet-b04412 img/s11.45 ms11.73 ms13.0 ms
efficientnet-b08803 img/s11.78 ms11.59 ms14.2 ms
efficientnet-b0161584 img/s11.89 ms11.9 ms13.63 ms
efficientnet-b0322915 img/s13.03 ms14.79 ms17.35 ms
efficientnet-b0646315 img/s12.71 ms13.59 ms15.27 ms
efficientnet-b01289311 img/s18.78 ms15.34 ms17.99 ms
efficientnet-b025610239 img/s39.05 ms24.97 ms29.24 ms
efficientnet-b4153 img/s20.45 ms19.06 ms20.36 ms
efficientnet-b42109 img/s20.01 ms19.74 ms21.5 ms
efficientnet-b44212 img/s20.6 ms19.88 ms22.37 ms
efficientnet-b48416 img/s21.02 ms21.46 ms24.82 ms
efficientnet-b416816 img/s21.53 ms22.91 ms26.06 ms
efficientnet-b4321208 img/s28.4 ms26.77 ms28.3 ms
efficientnet-b4641332 img/s50.55 ms48.23 ms48.49 ms
efficientnet-b41281418 img/s95.84 ms90.12 ms95.76 ms
efficientnet-b42561442 img/s191.48 ms176.19 ms189.04 ms
efficientnet-widese-b01104 img/s11.28 ms10.0 ms12.72 ms
efficientnet-widese-b02206 img/s11.41 ms10.65 ms12.72 ms
efficientnet-widese-b04426 img/s11.15 ms10.23 ms11.03 ms
efficientnet-widese-b08794 img/s11.9 ms12.68 ms14.17 ms
efficientnet-widese-b0161536 img/s12.32 ms13.22 ms14.57 ms
efficientnet-widese-b0322876 img/s14.12 ms14.45 ms16.23 ms
efficientnet-widese-b0646183 img/s13.02 ms14.19 ms16.68 ms
efficientnet-widese-b01289310 img/s20.06 ms15.24 ms17.84 ms
efficientnet-widese-b025610193 img/s36.07 ms25.13 ms34.22 ms
efficientnet-widese-b4153 img/s20.24 ms19.05 ms19.91 ms
efficientnet-widese-b42109 img/s20.98 ms19.24 ms22.58 ms
efficientnet-widese-b44213 img/s20.48 ms20.48 ms23.64 ms
efficientnet-widese-b48425 img/s20.57 ms20.26 ms22.44 ms
efficientnet-widese-b416800 img/s21.93 ms23.15 ms26.51 ms
efficientnet-widese-b4321201 img/s28.51 ms26.89 ms28.13 ms
efficientnet-widese-b4641322 img/s50.96 ms48.58 ms48.77 ms
efficientnet-widese-b41281417 img/s96.45 ms90.17 ms90.43 ms
efficientnet-widese-b42561439 img/s190.06 ms176.59 ms188.51 ms
Inference performance: NVIDIA V100 (1x V100 16GB)

Our results were obtained by running the applicable efficientnet/inference/<AMP|FP32>/*.sh inference script in the PyTorch 21.03 NGC container on NVIDIA DGX-1 (8x V100 16GB) GPUs.

FP32 Inference Latency
ModelBatch SizeThroughput AvgLatency AvgLatency 95%Latency 99%
efficientnet-b0183 img/s13.15 ms13.23 ms14.11 ms
efficientnet-b02167 img/s13.17 ms13.46 ms14.39 ms
efficientnet-b04332 img/s13.25 ms13.29 ms14.85 ms
efficientnet-b08657 img/s13.42 ms13.86 ms15.77 ms
efficientnet-b0161289 img/s13.78 ms15.02 ms16.99 ms
efficientnet-b0322140 img/s16.46 ms18.92 ms22.2 ms
efficientnet-b0642743 img/s25.14 ms23.44 ms23.79 ms
efficientnet-b01282908 img/s48.03 ms43.98 ms45.36 ms
efficientnet-b02562968 img/s94.86 ms85.62 ms91.01 ms
efficientnet-b4145 img/s23.31 ms23.3 ms24.9 ms
efficientnet-b4287 img/s24.07 ms23.81 ms25.14 ms
efficientnet-b44160 img/s26.29 ms26.78 ms30.85 ms
efficientnet-b48316 img/s26.65 ms26.44 ms28.61 ms
efficientnet-b416341 img/s48.18 ms46.9 ms47.13 ms
efficientnet-b432365 img/s89.07 ms87.83 ms88.02 ms
efficientnet-b464374 img/s173.2 ms171.61 ms172.27 ms
efficientnet-b4128376 img/s346.32 ms339.74 ms340.37 ms
efficientnet-widese-b0182 img/s13.37 ms12.95 ms13.89 ms
efficientnet-widese-b02168 img/s13.11 ms12.45 ms13.94 ms
efficientnet-widese-b04346 img/s12.73 ms12.22 ms12.95 ms
efficientnet-widese-b08674 img/s13.07 ms12.75 ms14.93 ms
efficientnet-widese-b0161235 img/s14.3 ms15.05 ms16.53 ms
efficientnet-widese-b0322194 img/s15.99 ms17.37 ms19.01 ms
efficientnet-widese-b0642747 img/s25.05 ms23.38 ms23.71 ms
efficientnet-widese-b01282906 img/s48.05 ms44.0 ms44.59 ms
efficientnet-widese-b02562962 img/s95.14 ms85.86 ms86.25 ms
efficientnet-widese-b4143 img/s24.28 ms25.24 ms27.36 ms
efficientnet-widese-b4287 img/s24.04 ms24.38 ms26.01 ms
efficientnet-widese-b44169 img/s24.96 ms25.8 ms27.14 ms
efficientnet-widese-b48307 img/s27.39 ms28.4 ms30.7 ms
efficientnet-widese-b416342 img/s48.05 ms46.74 ms46.9 ms
efficientnet-widese-b432363 img/s89.44 ms88.23 ms88.39 ms
efficientnet-widese-b464373 img/s173.47 ms172.01 ms172.36 ms
efficientnet-widese-b4128376 img/s347.18 ms340.09 ms340.45 ms
Mixed Precision Inference Latency
ModelBatch SizeThroughput AvgLatency AvgLatency 95%Latency 99%
efficientnet-b0162 img/s17.19 ms18.01 ms18.63 ms
efficientnet-b02119 img/s17.96 ms18.3 ms19.95 ms
efficientnet-b04238 img/s17.9 ms17.8 ms19.13 ms
efficientnet-b08495 img/s17.38 ms18.34 ms19.29 ms
efficientnet-b016945 img/s18.23 ms19.42 ms21.58 ms
efficientnet-b0321784 img/s19.29 ms20.71 ms22.51 ms
efficientnet-b0643480 img/s20.34 ms22.22 ms24.62 ms
efficientnet-b01285759 img/s26.11 ms22.61 ms24.06 ms
efficientnet-b02566176 img/s49.36 ms41.18 ms43.5 ms
efficientnet-b4134 img/s30.28 ms30.2 ms32.24 ms
efficientnet-b4269 img/s30.12 ms30.02 ms31.92 ms
efficientnet-b44129 img/s32.08 ms33.29 ms34.74 ms
efficientnet-b48242 img/s34.43 ms37.34 ms41.08 ms
efficientnet-b416488 img/s34.12 ms36.13 ms39.39 ms
efficientnet-b432738 img/s44.67 ms44.85 ms47.86 ms
efficientnet-b464809 img/s80.93 ms79.19 ms79.42 ms
efficientnet-b4128843 img/s156.42 ms152.17 ms152.76 ms
efficientnet-b4256847 img/s311.03 ms301.44 ms302.48 ms
efficientnet-widese-b0164 img/s16.71 ms17.59 ms19.23 ms
efficientnet-widese-b02129 img/s16.63 ms16.1 ms17.34 ms
efficientnet-widese-b04238 img/s17.92 ms17.52 ms18.82 ms
efficientnet-widese-b08445 img/s19.24 ms19.53 ms20.4 ms
efficientnet-widese-b016936 img/s18.64 ms19.55 ms21.1 ms
efficientnet-widese-b0321818 img/s18.97 ms20.62 ms23.06 ms
efficientnet-widese-b0643572 img/s19.81 ms21.14 ms23.29 ms
efficientnet-widese-b01285748 img/s26.18 ms23.72 ms26.1 ms
efficientnet-widese-b02566187 img/s49.11 ms41.11 ms41.59 ms
efficientnet-widese-b4132 img/s32.1 ms31.6 ms34.69 ms
efficientnet-widese-b4268 img/s30.4 ms30.9 ms32.67 ms
efficientnet-widese-b44123 img/s33.81 ms39.0 ms40.76 ms
efficientnet-widese-b48257 img/s32.34 ms33.39 ms34.93 ms
efficientnet-widese-b416497 img/s33.51 ms34.92 ms37.24 ms
efficientnet-widese-b432739 img/s44.63 ms43.62 ms46.39 ms
efficientnet-widese-b464808 img/s81.08 ms79.43 ms79.59 ms
efficientnet-widese-b4128840 img/s157.11 ms152.87 ms153.26 ms
efficientnet-widese-b4256846 img/s310.73 ms301.68 ms302.9 ms

Quantization results

QAT Training performance: NVIDIA DGX-1 (8x V100 32GB)
ModelGPUsCalibrationQAT modelFP32QAT ratio
efficientnet-quant-b0814.71 img/s2644.62 img/s3798 img/s0.696 x
efficientnet-quant-b481.85 img/s310.41 img/s666 img/s0.466 x
Quant Inference accuracy

The best checkpoints generated during training were used as a base for the QAT.

ModelQAT EpochsQAT Top1Gap between FP32 Top1 and QAT Top1
efficientnet-quant-b01077.120.51
efficientnet-quant-b4282.540.44

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.