EfficientNets are a family of image classification models, which achieve state-of-the-art accuracy, being an order-of-magnitude smaller and faster.
EfficientNet v2 is developed based on AutoML and compound scaling, but with a particular emphasis on faster training. For this purpose, the authors have proposed 3 major changes compared to v1: 1) the objective function of AutoML is revised so that the number of flops is now substituted by training time, because FLOPs is not an accurate surrogate of the actual training time; 2) a multi-stage training is proposed where the early stages of training use low resolution images and weak regularization, but the subsequent stages use larger images and stronger regularization; 3) an additional block called fused MBConv is used in AutoML, which replaces the 1x1 depth-wise convolution of MBConv with a regular 3x3 convolution.
EfficientNet v2 base model is scaled up using a non-uniform compounding scheme, through which the depth and width of blocks are scaled depending on where they are located in the base architecture. With this approach, the authors have identified the base "small" model, EfficientNet v2-S, and then scaled it up to obtain EfficientNet v2-M,L,XL. Below is the detailed overview of EfficientNet v2-S, which is reproduced in this repository.
This model was trained using script available on NGC and in GitHub repo.
The following datasets were used to train this model:
Performance numbers for this model are available in NGC.