NGC Catalog
CLASSIC
Welcome Guest
Models
Polyp Detection RT DETR Model

Polyp Detection RT DETR Model

For downloads and more information, please view on a desktop device.
Description
Polyp Detection RT-DETR is a RT-DETR model that is designed to detect polyps in colonoscopy images. This model is ready for commercial use.
Publisher
NVIDIA
Latest Version
20250304
Modified
June 12, 2025
Size
161.01 MB

Description:

Polyp Detection RT-DETR is a RT-DETR model that is designed to detect polyps in colonoscopy images. This model is ready for commercial use.

License/Terms of Use:

GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License. Additional Information: Apache License Version 2.0. You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Reference(s):

The model is trained on the REAL-Colon dataset [1], utilizing the RT-DETR v2 architecture [2] and the ResNet50 backbone [3]. The backbone is pretrained on the NVImageNet dataset.

[1] Biffi, Carlo, et al. "REAL-Colon: A dataset for developing real-world AI applications in colonoscopy." Scientific Data 11.1 (2024): 539.

[2] Lv, Wenyu, et al. "Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer." arXiv preprint arXiv:2407.17140 (2024).

[3] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

Model Architecture:

Architecture Type: Convolution Neural Network (CNN), Transformer
Network Architecture: RT DETR V2 with ResNet50 backbone

Input:

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: Two-Dimensional (2D)
Other Properties Related to Input: Image Range Needed (640 x 640 x 3), Pre-Processing Needed (value range [0, 255])

Output:

Output Type(s): A dictionary which contains two keys: "pred_logits" and "pred_boxes"
Output Format: The value of key "pred_logits" is a tensor with shape [300, 1]. The value of the key "pred_boxes" is a tensor with shape [300, 4]
Output Parameters: Two-Dimensional (2D)
Other Properties Related to Output: The values of the key "pred_boxes" are in range [0, 1], which represents normalized coordinates in format [center_x, center_y, width, height] relative to the input image size.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

  • [Holoscan SDK 2.9.0]

Supported Hardware Microarchitecture Compatibility:

  • [NVIDIA Ampere]
  • [NVIDIA Hopper]
  • [NVIDIA Lovelace]
  • [NVIDIA Volta]

[Preferred/Supported] Operating System(s):

  • [Linux]

Model Version(s):

rtdetrv2_timm_r50_nvimagenet_pretrained_neg_finetune_bhwc

Training, Testing, and Evaluation Datasets:

The total size: 60 recordings
Total number of datasets: 1
Dataset partition: Training 66%, testing 14%, validation 20%

Training Dataset:

Link: https://plus.figshare.com/articles/media/REAL-colon_dataset/22202866
Data Collection Method by dataset

  • [Human]

Labeling Method by dataset

  • [Human]

Properties: 40 recordings of real-world colonoscopies.

Testing Dataset:

Link: https://plus.figshare.com/articles/media/REAL-colon_dataset/22202866
Data Collection Method by dataset

  • [Human]

Labeling Method by dataset

  • [Human]

Properties : 12 recordings of real-world colonoscopies.

Evaluation Dataset:

Link: https://plus.figshare.com/articles/media/REAL-colon_dataset/22202866
Benchmark Score
mAP@0.5:0.95: 0.301

Data Collection Method by dataset

  • [Human]

Labeling Method by dataset

  • [Human]

Properties: 8 recordings of real-world colonoscopies.

Inference:

Acceleration Engine: Tensor(RT)
Test Hardware:

  • V100
  • A100
  • H100
  • RTX 6000 Ada

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Users are responsible for ensuring the model-generated segmentations are appropriately evaluated and comply with applicable safety regulations and ethical standards. Please make sure you have proper rights and permissions for all input images, particularly for personal health information.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.