NGC Catalog
CLASSIC
Welcome Guest
Models
TrafficCamNet Transformer Lite

TrafficCamNet Transformer Lite

For downloads and more information, please view on a desktop device.
Description
4 class object detection model for traffic intersections.
Publisher
NVIDIA
Latest Version
trainable_v1.0
Modified
July 22, 2025
Size
488.28 MB

TrafficCamNet Transformer - Lite

Model Overview

Description

TrafficCamNet Transformer - Lite is an object detection model that can detect one or more objects from four categories within an image and return a bounding box around each object, as well as a category label. The four categories are

  • car
  • road sign
  • person
  • bicycle

This model is ready for commercial use.

License/Terms of Use:

Use of this model is governed by the NVIDIA Community Models License

Deployment Geography:

Global

Use Case

This model can be used in computer vision use cases to detect cars, road signs, persons or two-wheelers in a video.

Release Date:

NGC [06/15/2025]

Reference

  • Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen: DETRs Beat YOLOs on Real-time Object Detection

Model Architecture

Architecture Type: Convolution Neural Network + Transformer Encoder Decoder. Network Architecture:

This model was developed based on the RT-DETR object detection model with a CNN backbone. The CNN in the backbone is a ResNet50 model, but the detection head is a Transformer encoder decoder.

Input

Input Type: Image Input Formats: Red, Green, Blue (RGB) Input Parameters: Two-Dimensional (2D) Other Properties Related to Input: Minimum 32 x 32 Resolution required; no alpha channel or bits

Note: All model variants were fine-tuned with 3x544x960 (CxHxW) image input.

Output

Output Type(s): Bounding boxes and Class labels
Output Parameters: One Dimensional (1D), Two Dimensional (2D) vectors
Other Properties Related to Output:

  • pred_logits: B x 300 (Batch Size x Number of Queries)
  • pred_boxes: B x 300 x 4 (Batch Size x Number of Queries x Coordinates in cxcywh format)

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

How to Use This Model

This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU with sufficient memory (>12G). This model can only be used with TAO Toolkit.

The primary use case for these models is object detection.

It is intended to be fine-tuned using Train Adapt Optimize (TAO) Toolkit or used directly as part of deployment SDKs such as DeepStream, Triton or TensorRT for object detection in Traffic systems. High fidelity models can be trained to detect classes that aren't originally included as part of the model. A Jupyter notebook is available as a part of TAO container and can be used to re-train.

Instructions to Use Pretrained Models with TAO

To use these models as pretrained weights for transfer learning, use the following snippet as a template for the model, dataset and train components of the experiment spec file to train a RT-DETR model. For more information on the experiment spec file, see the TAO Toolkit User Guide.

train:
  pretrained_model_path: /path/to/trafficcamnet/resnet50_its.pth
  precision: 'bf16'
  checkpoint_interval: 10
  validation_interval: 10
  num_epochs: 100]
model:
  backbone: resnet_50
  train_backbone: true
dataset:
  train_data_sources:
  - image_dir: /path/to/train/images
    json_file: /path/to/coco/format/train/annotations.json
  val_data_sources:
    image_dir: /path/to/validation/images
    json_file: /path/to/coco/format/val/annotations.json
  test_data_sources:
    iimage_dir: /path/to/test/images
    json_file: /path/to/coco/format/test/annotations.json
  infer_data_sources:
    image_dir:
    - /media/scratch.metropolis3/vpraveen/datasets/its_datasets/legacy_datasets_val/images
    classmap: /path/to/labels.txt
  batch_size: 4
  workers: 8
  remap_mscoco_category: false
  pin_memory: true
  dataset_type: serialized
  num_classes: 5
  eval_class_ids: null
  augmentation:
    multi_scales:
    - - 480
      - 832
    - - 512
      - 896
    - - 544
      - 960
    - - 544
      - 960
    - - 544
      - 960
    - - 576
      - 992
    - - 608
      - 1056
    - - 672
      - 1184
    - - 704
      - 1216
    - - 736
      - 1280
    - - 768
      - 1344
    - - 800
      - 1408
    train_spatial_size:
    - 544
    - 960
    eval_spatial_size:
    - 544
    - 960

Software Integration

Runtime Engine:

  • TAO 6.0.0

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Blackwell
  • NVIDIA Hopper
  • NVIDIA Lovelace
  • NVIDIA Pascal
  • NVIDIA Turing

[Preferred/Supported] Operating System(s):

  • Linux

Model Versions

  • trainable_v1.0 - Pre-trained TrafficCamNet model to facilitate transfer learning via TAO Toolkit..
  • deployable_v1.0 - Pre-trained model that's optimized for deployment via TensorRT.

Training, Testing and Evaluation Datasets

Training Datasets

Data Collection Method by dataset:

  • Hybrid: Automated, Human (custom collected and curated)

Labeling Method by dataset:

  • Hybrid: Automated, Human (custom collected and curated)

Properties:

Dataset No. of Images
NV Internal Data 240K

Testing Datasets

Data Collection Method by dataset:

  • Hybrid: Automated, Human (custom collected and curated)

Labeling Method by dataset:

  • Hybrid: Automated, Human

Properties:

Dataset No. of Images
NV Internal Data 45K

Evaluation Datasets

Data Collection Method by dataset:

  • Hybrid: Automated, Human (custom collected and curated)

Labeling Method by dataset:

  • Hybrid: Automated, Human

Properties:

Dataset No. of Images
NV Internal Data 25K

Performance

Evaluation Data

We test the TrafficCamNet Transformer - Lite models on a curated dataset consisting of 19,000 proprietary images across a variety of environments and traffic intersections, with corresponding bounding box annotations for cars. These frames are high resolution images of 1920x1080 pixels resized to 960x544 before inferring through the model.

Methodology and KPI

The true positives, false positives, false negatives are calculated using intersection-over-union (IOU) criterion greater than 0.5. The KPI for the evaluation data are reported in the table below. Model is evaluated based on precision, recall and accuracy.

Model TrafficCamNet Transformer - Lite
Content Precision Recall Accuracy
TrafficCamNet Transformer - Lite 92.65 89.95 83.9

Inference

Acceleration Engine: Tensor(RT), DeepStream

The TrafficCamNet Transformer - Lite model inference can be run via DeepStream using the same tao_detection apps.

Test Hardware:

  • A2
  • A30
  • DGX H100
  • DGX A100
  • L4
  • L40
  • Orin
  • Orin Nano 8GB
  • Orin NX
  • Orin NX16GB

The inference is run on the provided unpruned model at FP16 precision. The inference performance is run using trtexec on Jetson AGX Xavier, Xavier NX, Orin, Orin NX and NVIDIA T4, and Ampere GPUs. The Jetson devices are running at Max-N configuration for maximum GPU frequency. The performance shown here is the inference only performance. The end-to-end performance with streaming video data might vary depending on other bottlenecks in the hardware and software.

Platform BS FPS
Jetson Orin 8 86.107
RTX 4060Ti 8 278.151
RTX 4090 8 540.533
A100 32 808.631

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.