NGC Catalog
CLASSIC
Welcome Guest
Models
Pretrained ConvNeXtV2

Pretrained ConvNeXtV2

For downloads and more information, please view on a desktop device.
Description
Pretrained ConvNextv2 backbone models to facilitate transfer learning for commercially viable models.
Publisher
NVIDIA
Latest Version
convnextv2_large_v1.0
Modified
July 17, 2025
Size
2.21 GB

TAO Commercial Pretrained ConvNextv2

Model Overview

Description

ConvNextv2 is a model that can be used as a backbone for most of the popular computer vision tasks such as classification, segmentation and detection.

ConvNextv2 is a modern convolutional network architecture, codesigned with a fully convolutional masked autoencoder framework. It has shown improved performance over the pure ConvNets on various recognition benchmarks, including classification, detection, and segmentation.

This model is ready for commercial use.

License/Terms of Use

Use of this model is governed by the NVIDIA Community Models License

Deployment Geography

Global

Use Case

The primary use case for these models is feature extraction for downstream tasks like classification, object detection and segmentation.

Release Date

NGC [06/13/2025]

Reference

  • S. Woo, S. Debnath, R. Hu, X. Chen, Z. Liu, I.S Kweon, S Xie: ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders

Model Architecture

Architecture Type: Convolution Neural Network (CNN) Network Architecture: ConvNextv2-nano, ConvNext-tiny, ConvNextv2-large.

Input

  • Input Type: Image
  • Input Formats: Red, Green, Blue (RGB)
  • Input Parameters: Two-Dimensional (2D)
  • Other Properties Related to Input: Image of dimensions: 224 X 224 X 3 (H x W x C); no alpha channel or bits

Output

Output Type(s): Embedding - Float tensor Output Format: 2D Vector Output Parameters: Two-Dimensional (2D)

  • Other Properties Related to Output: Batch size X 1000

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

How to Use This Model

This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU with sufficient memory (>12G). This model can only be used with TAO Toolkit.

The primary use case for these models is feature extraction.

It is intended for training and fine-tune using Train Adapt Optimize (TAO) Toolkit. High fidelity models can be trained to new use cases. A Jupyter notebook is available as a part of TAO container and can be used to re-train.

Instructions to Use Pretrained Models with TAO

To use these models as pretrained weights for transfer learning, use the following snippet as a template for the model and train components of the experiment spec file to train a ConvNextv2 model. For more information on the experiment spec file, see the TAO Toolkit User Guide.

train:
  stage: "finetune"
  batch_size: 64
  pretrained_model_path: /path/to/convnextv2_checkpoint.pth
  precision: 'bf16-mixed'
  num_gpus: 8
  checkpoint_interval: 10
  validation_interval: 10
  num_epochs: 100
  smoothing: 0.1
model:
  arch: convnextv2_large
  num_classes: 1000
  drop_path_rate: 0.1

Software Integration

Runtime Engine:

  • TAO 5.5.0

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Blackwell
  • NVIDIA Hopper
  • NVIDIA Lovelace
  • NVIDIA Pascal
  • NVIDIA Turing
  • NVIDIA Volta

[Preferred/Supported] Operating System(s):

  • Linux

Model Versions

  • convnextv2_nano_trainable_v1.0 - Pre-trained ConvNextv2-nano model for finetuning.
  • convnextv2_tiny_trainable_v1.0 - Pre-trained ConvNextv2-tiny model for finetuning.
  • convnextv2_large_trainable_v1.0 - Pre-trained ConvNextv2-large model for finetuning.

Training, Testing and Evaluation Datasets

Training Datasets

Data Collection Method by dataset:

  • Automated

Labeling Method by dataset:

  • Automated

Properties:

Dataset No. of Images
NV Internal Data 5M

Testing Datasets

Data Collection Method by dataset:

  • Automated

Labeling Method by dataset:

  • Automated

Properties:

Dataset No. of Images
NV Internal Data 50,000

Evaluation Datasets

Link: https://www.image-net.org/

Data Collection Method by dataset:

  • Hybrid: Automated, Human

Labeling Method by dataset:

  • Hybrid: Automated, Human

Properties:
50,000 validation images from ImageNet dataset

Performance

Evaluation Data

We tested the ConvNextv2 models on the ImageNet 1k validation dataset.

Methodology and KPI

The KPI for the evaluation data are reported below.

model Precision Zero-shot KNN
ConvNextv2-nano FP32 0.69
ConvNextv2-tiny FP32 0.70
ConvNextv2-large FP32 0.70

Inference

Engine: Tensor(RT)
Test Hardware:

  • AGX Orin 64GB
  • Orin Nano 8GB
  • Orin NX 16GB

The inference is run on the provided unpruned model at FP16 precision. The inference performance is run using trtexec on Jetson AGX Xavier, Xavier NX, Orin, Orin NX and NVIDIA T4, and Ampere GPUs. The Jetson devices are running at Max-N configuration for maximum GPU frequency. The performance shown here is the inference only performance. The end-to-end performance with streaming video data might vary depending on other bottlenecks in the hardware and software.

Model Platform BS FPS
ConvNextv2-nano AGX Orin 64GB 16 834
ConvNextv2-nano Jetson Orin 16GB 16 317
ConvNextv2-nano Jetson Nano 8GB 8 212
ConvNextv2-tiny AGX Orin 64GB 16 533
ConvNextv2-tiny Jetson Orin 16GB 16 197
ConvNextv2-tiny Jetson Nano 8GB 8 135
ConvNextv2-large AGX Orin 64GB 16 139
ConvNextv2-large Jetson Orin 16GB 16 43
ConvNextv2-large Jetson Nano 8GB 16 35

Using TAO Pre-trained Models

  • Get TAO Container
  • Get other purpose-built models from the NGC model registry:
    • TrafficCamNet
    • PeopleNet
    • PeopleNet
    • PeopleNet-Transformer
    • DashCamNet
    • FaceDetectIR
    • VehicleMakeNet
    • VehicleTypeNet
    • PeopleSegNet
    • PeopleSemSegNet
    • License Plate Detection
    • License Plate Recognition
    • Gaze Estimation
    • Facial Landmark
    • Heart Rate Estimation
    • Gesture Recognition
    • Emotion Recognition
    • FaceDetect
    • 2D Body Pose Estimation
    • ActionRecognitionNet
    • ActionRecognitionNet
    • PoseClassificationNet
    • People ReIdentification
    • PointPillarNet
    • CitySegFormer
    • Retail Object Detection
    • Retail Object Embedding
    • Optical Inspection
    • Optical Character Detection
    • Optical Character Recognition
    • PCB Classification
    • PeopleSemSegFormer
    • LPDNet
    • License Plate Recognition
    • Gaze Estimation
    • Facial Landmark
    • Heart Rate Estimation
    • Gesture Recognition
    • Emotion Recognition
    • FaceDetect
    • 2D Body Pose Estimation
    • ActionRecognitionNet
    • ActionRecognitionNet
    • PoseClassificationNet
    • People ReIdentification
    • PointPillarNet
    • CitySegFormer
    • Retail Object Detection
    • Retail Object Embedding
    • Optical Inspection
    • Optical Character Detection
    • Optical Character Recognition
    • PCB Classification
    • PeopleSemSegFormer

Technical Blogs

  • Train like a ‘pro’ without being an AI expert using TAO AutoML
  • Create Custom AI models using NVIDIA TAO Toolkit with Azure Machine Learning
  • Developing and Deploying AI-powered Robots with NVIDIA Isaac Sim and NVIDIA TAO
  • Learn endless ways to adapt and supercharge your AI workflows with TAO - Whitepaper
  • Customize Action Recognition with TAO and deploy with DeepStream
  • Read the two-part blog on training and optimizing 2D body pose estimation model with TAO - Part 1 | Part 2
  • Learn how to train a real-time License plate detection and recognition app with TAO and DeepStream.
  • Model accuracy is extremely important; learn how you can achieve state of the art accuracy for classification and object detection models using TAO.

Suggested Reading

  • More information on TAO Toolkit and pre-trained models can be found at the NVIDIA Developer Zone
  • Refer to the TAO documentation
  • Read the TAO Toolkit Quick Start Guide and release notes.
  • If you have any questions or feedback, please refer to the discussions on the TAO Toolkit Developer Forums
  • Deploy your models for video analytics application using the DeepStream SDK.
  • Deploy your models in Riva for ConvAI use case.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.