NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

DLPP is a family of lightweight deep-learning networks trained for video post-processing. The model's introduction can be found at https://blogs.nvidia.com/blog/rtx-video-super-resolution/. To the public, it is also known as RTX VSR.

Publisher

NVIDIA

Latest Version

1.5

Modified

May 29, 2025

Size

60.55 MB

Model Overview

Description:

DLPP (Deep-Learning Post-Processing) is a family of lightweight deep-learning networks trained for video post-processing. The model's introduction can be found at https://blogs.nvidia.com/blog/rtx-video-super-resolution/. To the public, it is also known as RTX VSR.

Post-processing techniques are used to improve the video quality for the end user. It is usually out of the video encoder and decoder loop so that the engineers can develop and improve it rapidly. Post-processing includes but is not limited to removing encoding artifacts, increasing the resolution, and increasing the frame rate. It has been proven that deep learning models can provide better video quality compared to traditional algorithms.

License/Terms of Use:

DLPP model license in NVIDIA GPU driver: https://www.nvidia.com/en-us/drivers/geforce-license/.
DLPP model license in RTX video SDK: https://developer.nvidia.com/downloads/rtx/sdk/rtx_video_sdk_v1.1.0.zip.

Deployment Geography:

Global.

Use Case:

Individual users who own an NVIDIA GPU can use it to improve their video quality.

Release Date:

01/30/2023

References:

Model Architecture:

Architecture Type: Convolution Neural Network (CNN)

Network Architecture: U-Net-based CNN architecture

Input:

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: 2D
Other Properties Related to Input: The input consists of a single RGB image. The model is designed to work primarily with 8-bit and 10-bit input formats. As a pre-processing step, the image must be normalized to the range [0,1]. The supported input resolution is 640x360 to 3840x2160. The input does not include an alpha channel (no transparency) and is represented in [R, G, B] format.

Output:

Output Type(s): Image
Output Format: Red, Green, Blue (RGB)
Output Parameters: 2D
Other Properties Related to Output: The output consists of a single RGB image, ranging from 0 to 1. The output does not include an alpha channel (no transparency) and is represented in [R, G, B] format.

Software Integration:

Runtime Engine(s): CASK SDK.

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Lovelace
NVIDIA Blackwell
NVIDIA Turing

[Preferred/Supported] Operating System(s):

Windows

Model Version(s):

Version 1.0: First official release, which does super-resolution for videos played in Chrome and Edge browsers.
Version 1.5: Added more training data and updated the training flow to improve output image quality.
Version 2.0: Added support for INT8 CUDA kernels and 10-bit HDR input.

Training, Testing, and Evaluation Datasets:

Training Dataset:

We used internally captured camera footage and gaming recordings to train the model. The captured footage consists of diverse visual scenarios and various popular games.

Data Collection Method by dataset:

Human

Properties:

We captured high-quality camera footage in the wild. The capture resolution is either 4K or over 4K, and does not contain any faces of passersby or anyone close-up. We have also collected high-quality, 4K recordings of popular games in the market. No personal data or confidential data is involved.

Testing Dataset:

Same as the training dataset.

Properties:

The two datasets are divided into a training set and a test set, with 20% allocated to the test set, to evaluate the model's performance and ensure a reliable assessment of its generalization ability.

Evaluation Dataset:

The DLPP Natural Video 3rd Party dataset is used as our evaluation dataset.

Data Collection Method by dataset

Human

Properties:

A few dozen popular YouTube videos are used for evaluation.

Evaluation Result:

The DLPP model is evaluated mainly based on subjective quality assessment, in which engineers are invited to rate the output video quality. Their ratings are collected and averaged to compute the Mean Opinion Score (MOS). A Non-reference video quality evaluation metric NIQE is also employed. The evaluation results on the DLPP Natural Video 3rd Party dataset are presented below.

Model	NIQE score on DLPP Natural Video 3rd Party dataset
Bicubic	6.4278
DLPP ultra	5.4197
DLPP high	5.7380
DLPP medium	5.5502
DLPP low	5.3699

Inference:

Engine: CASK SDK
Test Hardware: RTX 4090

Realtime Inference Latency:

Model	Runtime on 1920x1080 input image on RTX4090(ms)
DLPP ultra	2.59
DLPP high	1.90
DLPP medium	0.78
DLPP low	0.61

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.