TrafficCamNet Transformer - Lite is an object detection model that can detect one or more objects from four categories within an image and return a bounding box around each object, as well as a category label. The four categories are
This model is ready for commercial use.
Use of this model is governed by the NVIDIA Community Models License
Global
This model can be used in computer vision use cases to detect cars, road signs, persons or two-wheelers in a video.
NGC [06/15/2025]
Architecture Type: Convolution Neural Network + Transformer Encoder Decoder. Network Architecture:
This model was developed based on the RT-DETR object detection model with a CNN backbone. The CNN in the backbone is a ResNet50 model, but the detection head is a Transformer encoder decoder.
Input Type: Image Input Formats: Red, Green, Blue (RGB) Input Parameters: Two-Dimensional (2D) Other Properties Related to Input: Minimum 32 x 32 Resolution required; no alpha channel or bits
Note: All model variants were fine-tuned with 3x544x960 (CxHxW) image input.
Output Type(s): Bounding boxes and Class labels
Output Parameters: One Dimensional (1D), Two Dimensional (2D) vectors
Other Properties Related to Output:
pred_logits
: B x 300 (Batch Size x Number of Queries)pred_boxes
: B x 300 x 4 (Batch Size x Number of Queries x Coordinates in cxcywh
format)Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU with sufficient memory (>12G). This model can only be used with TAO Toolkit.
The primary use case for these models is object detection.
It is intended to be fine-tuned using Train Adapt Optimize (TAO) Toolkit or used directly as part of deployment SDKs such as DeepStream, Triton or TensorRT for object detection in Traffic systems. High fidelity models can be trained to detect classes that aren't originally included as part of the model. A Jupyter notebook is available as a part of TAO container and can be used to re-train.
To use these models as pretrained weights for transfer learning, use the following snippet as a template for
the model
, dataset
and train
components of the experiment spec file to train a RT-DETR model.
For more information on the experiment spec file, see the TAO Toolkit User Guide.
train:
pretrained_model_path: /path/to/trafficcamnet/resnet50_its.pth
precision: 'bf16'
checkpoint_interval: 10
validation_interval: 10
num_epochs: 100]
model:
backbone: resnet_50
train_backbone: true
dataset:
train_data_sources:
- image_dir: /path/to/train/images
json_file: /path/to/coco/format/train/annotations.json
val_data_sources:
image_dir: /path/to/validation/images
json_file: /path/to/coco/format/val/annotations.json
test_data_sources:
iimage_dir: /path/to/test/images
json_file: /path/to/coco/format/test/annotations.json
infer_data_sources:
image_dir:
- /media/scratch.metropolis3/vpraveen/datasets/its_datasets/legacy_datasets_val/images
classmap: /path/to/labels.txt
batch_size: 4
workers: 8
remap_mscoco_category: false
pin_memory: true
dataset_type: serialized
num_classes: 5
eval_class_ids: null
augmentation:
multi_scales:
- - 480
- 832
- - 512
- 896
- - 544
- 960
- - 544
- 960
- - 544
- 960
- - 576
- 992
- - 608
- 1056
- - 672
- 1184
- - 704
- 1216
- - 736
- 1280
- - 768
- 1344
- - 800
- 1408
train_spatial_size:
- 544
- 960
eval_spatial_size:
- 544
- 960
Runtime Engine:
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
Dataset | No. of Images |
---|---|
NV Internal Data | 240K |
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
Dataset | No. of Images |
---|---|
NV Internal Data | 45K |
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
Dataset | No. of Images |
---|---|
NV Internal Data | 25K |
We test the TrafficCamNet Transformer - Lite models on a curated dataset consisting of 19,000 proprietary images across a variety of environments and traffic intersections, with corresponding bounding box annotations for cars. These frames are high resolution images of 1920x1080 pixels resized to 960x544 before inferring through the model.
The true positives, false positives, false negatives are calculated using intersection-over-union (IOU) criterion greater than 0.5. The KPI for the evaluation data are reported in the table below. Model is evaluated based on precision, recall and accuracy.
Model | TrafficCamNet Transformer - Lite | ||
---|---|---|---|
Content | Precision | Recall | Accuracy |
TrafficCamNet Transformer - Lite | 92.65 | 89.95 | 83.9 |
Acceleration Engine: Tensor(RT), DeepStream
The TrafficCamNet Transformer - Lite model inference can be run via DeepStream using the same tao_detection apps.
Test Hardware:
The inference is run on the provided unpruned model at FP16 precision. The inference performance is run using
trtexec
on Jetson AGX Xavier, Xavier NX, Orin, Orin NX and NVIDIA T4, and Ampere GPUs.
The Jetson devices are running at Max-N configuration for maximum GPU frequency. The performance shown here is the inference only performance.
The end-to-end performance with streaming video data might vary depending on other bottlenecks in the hardware and software.
Platform | BS | FPS |
---|---|---|
Jetson Orin | 8 | 86.107 |
RTX 4060Ti | 8 | 278.151 |
RTX 4090 | 8 | 540.533 |
A100 | 32 | 808.631 |
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.