ConvNextv2 is a model that can be used as a backbone for most of the popular computer vision tasks such as classification, segmentation and detection.
ConvNextv2 is a modern convolutional network architecture, codesigned with a fully convolutional masked autoencoder framework. It has shown improved performance over the pure ConvNets on various recognition benchmarks, including classification, detection, and segmentation.
This model is ready for commercial use.
Use of this model is governed by the NVIDIA Community Models License
Global
The primary use case for these models is feature extraction for downstream tasks like classification, object detection and segmentation.
NGC [06/13/2025]
Architecture Type: Convolution Neural Network (CNN) Network Architecture: ConvNextv2-nano, ConvNext-tiny, ConvNextv2-large.
Output Type(s): Embedding - Float tensor Output Format: 2D Vector Output Parameters: Two-Dimensional (2D)
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU with sufficient memory (>12G). This model can only be used with TAO Toolkit.
The primary use case for these models is feature extraction.
It is intended for training and fine-tune using Train Adapt Optimize (TAO) Toolkit. High fidelity models can be trained to new use cases. A Jupyter notebook is available as a part of TAO container and can be used to re-train.
To use these models as pretrained weights for transfer learning, use the following snippet as a template for the model
and train
components of the experiment spec file to train a ConvNextv2 model. For more information on the experiment spec file, see the TAO Toolkit User Guide.
train:
stage: "finetune"
batch_size: 64
pretrained_model_path: /path/to/convnextv2_checkpoint.pth
precision: 'bf16-mixed'
num_gpus: 8
checkpoint_interval: 10
validation_interval: 10
num_epochs: 100
smoothing: 0.1
model:
arch: convnextv2_large
num_classes: 1000
drop_path_rate: 0.1
Runtime Engine:
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
Dataset | No. of Images |
---|---|
NV Internal Data | 5M |
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
Dataset | No. of Images |
---|---|
NV Internal Data | 50,000 |
Link: https://www.image-net.org/
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
50,000 validation images from ImageNet dataset
We tested the ConvNextv2 models on the ImageNet 1k validation dataset.
The KPI for the evaluation data are reported below.
model | Precision | Zero-shot KNN |
---|---|---|
ConvNextv2-nano | FP32 | 0.69 |
ConvNextv2-tiny | FP32 | 0.70 |
ConvNextv2-large | FP32 | 0.70 |
Engine: Tensor(RT)
Test Hardware:
The inference is run on the provided unpruned model at FP16 precision. The inference performance is run using trtexec
on Jetson AGX Xavier, Xavier NX, Orin, Orin NX and NVIDIA T4, and Ampere GPUs. The Jetson devices are running at Max-N configuration for maximum GPU frequency. The performance shown here is the inference only performance. The end-to-end performance with streaming video data might vary depending on other bottlenecks in the hardware and software.
Model | Platform | BS | FPS |
---|---|---|---|
ConvNextv2-nano | AGX Orin 64GB | 16 | 834 |
ConvNextv2-nano | Jetson Orin 16GB | 16 | 317 |
ConvNextv2-nano | Jetson Nano 8GB | 8 | 212 |
ConvNextv2-tiny | AGX Orin 64GB | 16 | 533 |
ConvNextv2-tiny | Jetson Orin 16GB | 16 | 197 |
ConvNextv2-tiny | Jetson Nano 8GB | 8 | 135 |
ConvNextv2-large | AGX Orin 64GB | 16 | 139 |
ConvNextv2-large | Jetson Orin 16GB | 16 | 43 |
ConvNextv2-large | Jetson Nano 8GB | 16 | 35 |
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.