NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

Polyp Detection RT-DETR is a RT-DETR model that is designed to detect polyps in colonoscopy images. This model is ready for commercial use.

Publisher

NVIDIA

Latest Version

20250304

Modified

June 12, 2025

Size

161.01 MB

Description:

Polyp Detection RT-DETR is a RT-DETR model that is designed to detect polyps in colonoscopy images. This model is ready for commercial use.

License/Terms of Use:

GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License. Additional Information: Apache License Version 2.0. You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Reference(s):

The model is trained on the REAL-Colon dataset [1], utilizing the RT-DETR v2 architecture [2] and the ResNet50 backbone [3]. The backbone is pretrained on the NVImageNet dataset.

[1] Biffi, Carlo, et al. "REAL-Colon: A dataset for developing real-world AI applications in colonoscopy." Scientific Data 11.1 (2024): 539.

[2] Lv, Wenyu, et al. "Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer." arXiv preprint arXiv:2407.17140 (2024).

[3] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Model Architecture:

Architecture Type: Convolution Neural Network (CNN), Transformer
Network Architecture: RT DETR V2 with ResNet50 backbone

Input:

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: Two-Dimensional (2D)
Other Properties Related to Input: Image Range Needed (640 x 640 x 3), Pre-Processing Needed (value range [0, 255])

Output:

Output Type(s): A dictionary which contains two keys: "pred_logits" and "pred_boxes"
Output Format: The value of key "pred_logits" is a tensor with shape [300, 1]. The value of the key "pred_boxes" is a tensor with shape [300, 4]
Output Parameters: Two-Dimensional (2D)
Other Properties Related to Output: The values of the key "pred_boxes" are in range [0, 1], which represents normalized coordinates in format [center_x, center_y, width, height] relative to the input image size.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

[Holoscan SDK 2.9.0]

Supported Hardware Microarchitecture Compatibility:

[NVIDIA Ampere]
[NVIDIA Hopper]
[NVIDIA Lovelace]
[NVIDIA Volta]

[Preferred/Supported] Operating System(s):

[Linux]

Model Version(s):

rtdetrv2_timm_r50_nvimagenet_pretrained_neg_finetune_bhwc

Training, Testing, and Evaluation Datasets:

The total size: 60 recordings
Total number of datasets: 1
Dataset partition: Training 66%, testing 14%, validation 20%

Training Dataset:

Link: https://plus.figshare.com/articles/media/REAL-colon_dataset/22202866
Data Collection Method by dataset

[Human]

Labeling Method by dataset

[Human]

Properties: 40 recordings of real-world colonoscopies.

Testing Dataset:

Link: https://plus.figshare.com/articles/media/REAL-colon_dataset/22202866
Data Collection Method by dataset

[Human]

Labeling Method by dataset

[Human]

Properties : 12 recordings of real-world colonoscopies.

Evaluation Dataset:

Link: https://plus.figshare.com/articles/media/REAL-colon_dataset/22202866
Benchmark Score
mAP@0.5:0.95: 0.301

Data Collection Method by dataset

[Human]

Labeling Method by dataset

[Human]

Properties: 8 recordings of real-world colonoscopies.

Inference:

Acceleration Engine: Tensor(RT)
Test Hardware:

V100
A100
H100
RTX 6000 Ada

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Users are responsible for ensuring the model-generated segmentations are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please make sure you have proper rights and permissions for all input images, particularly for personal health information.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Polyp Detection RT DETR Model

Description:

License/Terms of Use:

Reference(s):

Model Architecture:

Input:

Output:

Software Integration:

Model Version(s):

Training, Testing, and Evaluation Datasets:

Training Dataset:

Testing Dataset:

Evaluation Dataset:

Inference:

Ethical Considerations: