NGC | Catalog
CatalogModelsEfficientNet v2-S checkpoint (TensorFlow2, AMP, Imagenet)

EfficientNet v2-S checkpoint (TensorFlow2, AMP, Imagenet)

For downloads and more information, please view on a desktop device.
Logo for EfficientNet v2-S checkpoint (TensorFlow2, AMP, Imagenet)


EfficientNet v2-S TensorFlow2 savedmodel, trained on Imagenet using 1 DGX1 V100 (batchsize=1024=8x128)


NVIDIA Deep Learning Examples

Use Case




Latest Version



September 22, 2022


329.16 MB

Model Overview

EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, being an order-of-magnitude smaller and faster.

Model Architecture

EfficientNet v2 is developed based on AutoML and compound scaling, but with a particular emphasis on faster training. For this purpose, the authors have proposed 3 major changes compared to v1: 1) the objective function of AutoML is revised so that the number of flops is now substituted by training time, because FLOPs is not an accurate surrogate of the actual training time; 2) a multi-stage training is proposed where the early stages of training use low resolution images and weak regularization, but the subsequent stages use larger images and stronger regularization; 3) an additional block called fused MBConv is used in AutoML, which replaces the 1x1 depth-wise convolution of MBConv with a regular 3x3 convolution.

Efficientnet v2-S

EfficientNet v2 base model is scaled up using a non-uniform compounding scheme, through which the depth and width of blocks are scaled depending on where they are located in the base architecture. With this approach, the authors have identified the base "small" model, EfficientNet v2-S, and then scaled it up to obtain EfficientNet v2-M,L,XL. Below is the detailed overview of EfficientNet v2-S, which is reproduced in this repository.


This model was trained using script available on NGC and in GitHub repo.


The following datasets were used to train this model:

  • ImageNet - Image database organized according to the WordNet hierarchy, in which each noun is depicted by hundreds and thousands of images.


Performance numbers for this model are available in NGC.



This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.