NGC Catalog
CLASSIC
Welcome Guest
Models
NvDepthAnythingV2

NvDepthAnythingV2

For downloads and more information, please view on a desktop device.
Description
monocular relative depth estimation model.
Publisher
NVIDIA
Latest Version
deployable_relative_depthanythingv2_large_v1.0
Modified
August 5, 2025
Size
1.25 GB

NvDepthAnythingV2 Overview

Description:

DepthAnythingV2 is a state-of-the-art monocular depth estimation model that generalizes well on zero-shot images. TAO NvDepthAnythingV2 relative depth model is a pretrained commercial model that can produce a relative depth map given a single image. In addition, NvDepthAnythingV2 can be used as initialization when fine-tuning the MetricDepthAnything model for better accuracy.

This model is ready for commercial use.

License/Terms of Use

Use of this model is governed by the NVIDIA Community Model License. Additional Information: Apache 2.0.

Deployment Geography:

Global

Use Case:

This model is intended for developers working on industrial, robotics, and smart space applications to estimate the depth from monocular image input.

Release Date:

NGC [07/25/2025] link

References(s):

DepthAnythingV2 paper

Model Architecture:

Architecture Type: Transformer-based Network Architecture

Network Architecture: Vit-Large
Number of model parameters: 3.6*10^8 **This model was developed based on DINOV2-ViT-Large

Computational Load

Cumulative Compute: 2.0952*10^16
Estimated Energy and Emissions for Model Training:

  • Energy Consumption in kWh: 1075.2 kWh
  • Emissions: 0.3472896 tCO2e

Input:

Input Type(s): RGB image
Input Format: Red, Green, Blue (RGB)
Input Parameters: Two-Dimensional (2D)
Other Properties Related to Input: B X 3 X H X W (Batch Size x Channel x Height x Width)

Output:

Output Type(s): Image
Output Format: Depth Map Image
Output Parameters: Two-Dimensional (2D)
Other Properties Related to Output: B X 3 X H X W (Batch Size x Channel x Height x Width)

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engines

  • TAO 6.2.0

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Blackwell
  • NVIDIA Jetson
  • NVIDIA Hopper
  • NVIDIA Lovelace
  • NVIDIA Pascal
  • NVIDIA Turing
  • NVIDIA Volta

Preferred/Supported Operating System(s):

  • Linux
  • Linux 4 Tegra
  • QNX
  • Windows

Model Version(s):

  • deployable_relative_depthanythingv2_large_v1.0: decrypted ONNX files. Inference supported in TAO Toolkit.

Training, Testing, and Evaluation Datasets:

** The total size (in number of data points): 6.02M images
** Total number of datasets: 4 datasets

** Dataset partition: Training 5.99M, validation 29K

Training Dataset:

Data Collection Method by dataset:

  • Hybrid: Automatic/Sensors, Synthetic

Labeling Method by dataset:

  • Hybrid: Automatic/Sensors, Synthetic

Properties:

Dataset No. of Images Sensor
NV Internal Real Image Data 3.6M Automated
NV Internal Real Image Data 94K RGB Camera
NV Internal Synthetic Data 2M Synthetic
Crestereo Synthetic Data 196K Synthetic

Evaluation Dataset:

Link:

  • NYUDV2 Dataset Page

**Benchmark Score
Accuracy was determined using the following metrics:

  • AbsRel : (Scale-Shift Invariant) Absolute Relative Error
  • D-1 : (Scale-Shift Invariant) This measures the percentage of predicted pixels that differ from the true pixels by no more than 25%.

The results is zero-shot evaluation on the NYUDV2 dataset and relative depth (inverse depth) with scale-shift invariant accuracy is used to calculate the metric.

Method AbsRel D-1
NvDepthAnythingV2 0.047 0.979

Data Collection Method by dataset:

  • Undisclosed

Labeling Method by dataset:

  • Automatic/Sensors

Properties: 654 images and ground truth depth were used for evaluation.

Inference

Acceleration Engine: Onnx

Test Hardware [Name the specific test hardware model]

  • Jetson AGX Orin
  • Jetson AGX Thor
  • L4
  • L40S
  • A100
  • RTX PRO 6000 Blackwell
  • H200
  • H100
  • B200
  • GB200

The inference performance of the provided NvRelativeDepthAnything model is evaluated at FP16 precisions. The model's input resolution is 3x518x914 pixels. The performance assessment was conducted using trtexec on a range of devices.

Platform BS FPS
Jetson AGX Orin 32 3.705
Jetson AGX Thor 32 3.339
L4 32 17.891
L40S 32 62.520
A100 32 73.994
RTX PRO 6000 Blackwell 32 112.972
H200 32 166.607
H100 32 172.782
B200 32 320.405
GB200 32 350.405

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.