# TrafficCamNet Transformer - Lite ## Model Overview ### Description TrafficCamNet Transformer - Lite is an object detection model that can detect one or more objects from four categories within an image and return a bounding box around each object, as well as a category label. The four categories are * car * road sign * person * bicycle This model is ready for commercial use. ## License/Terms of Use: License to use these models is covered by the NVIDIA Open Model License. By downloading the model, you accept the terms and conditions of these [licenses](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). ## Deployment Geography: Global ## Use Case This model can be used in computer vision use cases to detect cars, road signs, persons or two-wheelers in a video. ## Release Date: NGC [06/15/2025] ## Reference - Y. Zhao, W. Lv, S. Xu, J. Wei, G. Wang, Q. Dang, Y. Liu, J. Chen: [DETRs Beat YOLOs on Real-time Object Detection](https://arxiv.org/pdf/2304.08069) ### Model Architecture **Architecture Type:** Convolution Neural Network + Transformer Encoder Decoder. **Network Architecture:** This model was developed based on the [RT-DETR](https://arxiv.org/pdf/2304.08069) object detection model with a CNN backbone. The CNN in the backbone is a ResNet50 model, but the detection head is a Transformer encoder decoder. ### Input **Input Type:** Image **Input Formats:** Red, Green, Blue (RGB) **Input Parameters**: Two-Dimensional (2D) **Other Properties Related to Input:** Minimum 32 x 32 Resolution required; no alpha channel or bits > Note: All model variants were fine-tuned with 3x544x960 (CxHxW) image input. ### Output **Output Type(s):** Bounding boxes and Class labels
**Output Parameters:** One Dimensional (1D), Two Dimensional (2D) vectors
**Other Properties Related to Output:** - `pred_logits`: B x 300 (Batch Size x Number of Queries) - `pred_boxes`: B x 300 x 4 (Batch Size x Number of Queries x Coordinates in `cxcywh` format) Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions. ## How to Use This Model This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU with sufficient memory (>12G). This model can only be used with [TAO Toolkit](https://developer.nvidia.com/tao-toolkit). The primary use case for these models is object detection. It is intended to be fine-tuned using Train Adapt Optimize (TAO) Toolkit or used directly as part of deployment SDKs such as DeepStream, Triton or TensorRT for object detection in Traffic systems. High fidelity models can be trained to detect classes that aren't originally included as part of the model. A Jupyter notebook is available as a part of [TAO container](https://ngc.nvidia.com/catalog/containers/nvidia:tao:tao-toolkit-pytorch) and can be used to re-train. ### Instructions to Use Pretrained Models with TAO To use these models as pretrained weights for transfer learning, use the following snippet as a template for the `model`, `dataset` and `train` components of the experiment spec file to train a RT-DETR model. For more information on the experiment spec file, see the [TAO Toolkit User Guide](https://docs.nvidia.com/tao/tao-toolkit/index.html). ```yaml train: pretrained_model_path: /path/to/trafficcamnet/resnet50_its.pth precision: 'bf16' checkpoint_interval: 10 validation_interval: 10 num_epochs: 100] model: backbone: resnet_50 train_backbone: true dataset: train_data_sources: - image_dir: /path/to/train/images json_file: /path/to/coco/format/train/annotations.json val_data_sources: image_dir: /path/to/validation/images json_file: /path/to/coco/format/val/annotations.json test_data_sources: iimage_dir: /path/to/test/images json_file: /path/to/coco/format/test/annotations.json infer_data_sources: image_dir: - /media/scratch.metropolis3/vpraveen/datasets/its_datasets/legacy_datasets_val/images classmap: /path/to/labels.txt batch_size: 4 workers: 8 remap_mscoco_category: false pin_memory: true dataset_type: serialized num_classes: 5 eval_class_ids: null augmentation: multi_scales: - - 480 - 832 - - 512 - 896 - - 544 - 960 - - 544 - 960 - - 544 - 960 - - 576 - 992 - - 608 - 1056 - - 672 - 1184 - - 704 - 1216 - - 736 - 1280 - - 768 - 1344 - - 800 - 1408 train_spatial_size: - 544 - 960 eval_spatial_size: - 544 - 960 ``` ### Software Integration **Runtime Engine:** * TAO 6.0.0 **Supported Hardware Microarchitecture Compatibility:** * NVIDIA Ampere * NVIDIA Blackwell * NVIDIA Hopper * NVIDIA Lovelace * NVIDIA Pascal * NVIDIA Turing **[Preferred/Supported] Operating System(s):** * Linux ### Model Versions - **trainable_v1.0** - Pre-trained TrafficCamNet model to facilitate transfer learning via TAO Toolkit.. - **deployable_v1.0** - Pre-trained model that's optimized for deployment via TensorRT. ## Training, Testing and Evaluation Datasets ### Training Datasets **Data Collection Method by dataset:**
* Hybrid: Automated, Human (custom collected and curated)
**Labeling Method by dataset:**
* Hybrid: Automated, Human (custom collected and curated)
**Properties:**
| Dataset | No. of Images | |--|--| | NV Internal Data | 240K | ### Testing Datasets **Data Collection Method by dataset:**
* Hybrid: Automated, Human (custom collected and curated)
**Labeling Method by dataset:**
* Hybrid: Automated, Human
**Properties:**
| Dataset | No. of Images | |--|--| |NV Internal Data| 45K | ### Evaluation Datasets **Data Collection Method by dataset:**
* Hybrid: Automated, Human (custom collected and curated)
**Labeling Method by dataset:**
* Hybrid: Automated, Human
**Properties:**
| Dataset | No. of Images | |--|--| |NV Internal Data| 25K | ## Performance ### Evaluation Data We test the TrafficCamNet Transformer - Lite models on a curated dataset consisting of 19,000 proprietary images across a variety of environments and traffic intersections, with corresponding bounding box annotations for cars. These frames are high resolution images of 1920x1080 pixels resized to 960x544 before inferring through the model. ### Methodology and KPI The true positives, false positives, false negatives are calculated using intersection-over-union (IOU) criterion greater than 0.5. The KPI for the evaluation data are reported in the table below. Model is evaluated based on precision, recall and accuracy. |Model| |TrafficCamNet Transformer - Lite | | |-----|----|----|----| |Content|Precision|Recall|Accuracy| |TrafficCamNet Transformer - Lite|92.65|89.95|83.9| ## Inference **Acceleration Engine:** Tensor(RT), DeepStream
The TrafficCamNet Transformer - Lite model inference can be run via DeepStream using the same [tao_detection](https://github.com/NVIDIA-AI-IOT/deepstream_tao_apps/tree/master/apps/tao_detection) apps. **Test Hardware:**
- A2 - A30 - DGX H100 - DGX A100 - L4 - L40 - Orin - Orin Nano 8GB - Orin NX - Orin NX16GB The inference is run on the provided unpruned model at FP16 precision. The inference performance is run using [`trtexec`](https://github.com/NVIDIA/TensorRT/tree/main/samples/trtexec) on Jetson AGX Xavier, Xavier NX, Orin, Orin NX and NVIDIA T4, and Ampere GPUs. The Jetson devices are running at Max-N configuration for maximum GPU frequency. The performance shown here is the inference only performance. The end-to-end performance with streaming video data might vary depending on other bottlenecks in the hardware and software. | Platform | BS | FPS | |------------------|----|--------| | Jetson Orin | 8 | 86.107 | | RTX 4060Ti | 8 | 278.151| | RTX 4090 | 8 | 540.533| | A100 | 32 | 808.631| ## Ethical Considerations NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).