NGC | Catalog
CatalogContainersTAO Toolkit for Computer Vision

TAO Toolkit for Computer Vision

Logo for TAO Toolkit for Computer Vision
Docker container with workflows implemented in TensorFlow as part of the Train Adapt Optimize (TAO) Toolkit.
Latest Tag
July 25, 2023
Compressed Size
129.09 MB
Multinode Support
Multi-Arch Support
v3.22.05-beta-api (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

What is Train Adapt Optimize (TAO) Toolkit?

Train Adapt Optimize (TAO) Toolkit is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TAO adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.

The pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more.

Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TAO, DeepStream SDK and TensorRT. TAO can train models for common vision AI tasks such as object detection, classification, instance segmentation as well as other complex tasks such as pose estimation, facial landmark, gaze estimation, heart rate estimation and others.

Purpose-built Pre-Trained Models

Purpose-built pre-trained models offer highly accurate AI for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.


2D Body Pose Estimation

Facial Landmark Estimation

The purpose-built models are available on NGC. Under each model cards, there is a pruned version that can be deployed as is or an unpruned version which can be used with TAO to fine tune with your own dataset.

Model Name Network Architecture Number
of classes
Accuracy Use Case
TrafficCamNet DetectNet_v2-ResNet18 4 83.5% mAP Detect and track cars
PeopleNet DetectNet_v2-ResNet18 3 80% mAP People counting, heatmap generation, social distancing
PeopleNet DetectNet_v2-ResNet34 3 84% mAP People counting, heatmap generation, social distancing
DashCamNet DetectNet_v2-ResNet18 4 80% mAP Identify objects from a moving object
FaceDetectIR DetectNet_v2-ResNet18 1 96% mAP Detect face in a dark environment with IR camera
VehicleMakeNet ResNet18 20 91% mAP Classifying car models
VehicleTypeNet ResNet18 6 96% mAP Classifying type of cars as coupe, sedan, truck, etc
PeopleSegNet MaskRCNN-ResNet50 1 85% mAP Creates segmentation masks around people, provides pixel
PeopleSemSegNet UNET 1 92% MIOU Creates semantic segmentation masks around people. Filters person from the background
License Plate Detection DetectNet_v2-ResNet18 1 98% mAP Detecting and localizing License plates on vehicles
License Plate Recognition Tuned ResNet18 36(US) / 68(CH) 97%(US)/99%(CH) Recognize License plates numbers
Gaze Estimation Four branch AlexNet based model N/A 6.5 RMSE Detects person's eye gaze
Facial Landmark Recombinator networks N/A 6.1 pixel error Estimates key points on person's face
Heart Rate Estimation Two branch model with attention N/A 0.7 BPM Estimates person's heartrate from RGB video
Gesture Recognition ResNet18 6 0.85 F1 score Recognize hand gestures
Emotion Recognition 5 Fully Connected Layers 6 0.91 F1 score Recognize facial Emotion
FaceDetect DetectNet_v2-ResNet18 1 85.3 mAP Detect faces from RGB or grayscale image
2D Body Pose Estimation Single shot bottom-up 18 - Estimates key joints on person's body
ActionRecognitionNet 2D RGB-only Resnet18 5 82.88 Recognizes action of a person from a sequence of images
ActionRecognitionNet 3D RGB-only Resnet18 5 85.59 Recognizes action of a person from a sequence of images
PoseClassificationNet ST-GCN 6 89.53 Recognizes action of a person from a sequence of skeletons

Architecture specific pre-trained models

In addition to purpose-built models, TAO Toolkit supports the following detection architectures:

These detection meta-architectures can be used with 13 backbones or feature extractors with TAO. For a complete list of all the permutations that are supported by TAO, please see the matrix below:

TAO Toolkit supports instance segmentation using MaskRCNN architecture.

TAO Toolkit supports semantic segmentation using UNET architecture.


To get started, first choose the model architecture that you want to build, then select the appropriate model card on NGC and then choose one of the supported backbones.


Running TAO Toolkit

  1. Setup your python environment using python virtualenv and virtualenvwrapper.

  2. In TAO Toolkit, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tao-launcher will handle that. You may install the launcher using pip with the following commands.

pip3 install nvidia-tao
  1. Download the Jupyter notebooks that you are interested in from NGC resources. After installing the pre-requisite, all the training steps will be run from inside the Jupyter notebook.
Purpose-built Model Jupyter notebook
PeopleNet detectnet_v2/detectnet_v2.ipynb
TrafficCamNet detectnet_v2/detectnet_v2.ipynb
DashCamNet detectnet_v2/detectnet_v2.ipynb
FaceDetectIR detectnet_v2/detectnet_v2.ipynb
VehicleMakeNet classification/classification.ipynb
VehicleTypeNet classification/classification.ipynb
PeopleSegNet mask_rcnn/mask_rcnn.ipynb
License Plate Detection detectnet_v2/detectnet_v2.ipynb
License Plate Recognition lprnet/lprnet.ipynb
Gaze Estimation gazenet/gazenet.ipynb
Facial Landmark fpenet/fpenet.ipynb
Heart Rate Estimation heartratenet/heartratenet.ipynb
Gesture Recognition gesturenet/gesturenet.ipynb
Emotion Recognition emotionnet/emotionnet.ipynb
FaceDetect facenet/facenet.ipynb
2D Body Pose Net bpnet/bpnet.ipynb
PeopleSemSegNet unet/unet_isbi.ipynb
ActionRecognitionNet action_recognition_net/actionrecognitionnet.ipynb
PoseClassificationNet pose_classification_net/poseclassificationnet.ipynb
PointPillars pointpillars/
Open model architecture Jupyter notebook
DetectNet_v2 detectnet_v2/detectnet_v2.ipynb
EfficientDet efficientdet/efficientdet.ipynb
FasterRCNN faster_rcnn/faster_rcnn.ipynb
YOLOV3 yolo_v3/yolo_v3.ipynb
YOLOV4 yolo_v4/yolo_v4.ipynb
YOLOV4-Tiny yolo_v4_tiny/yolo_v4_tiny.ipynb
SSD ssd/ssd.ipynb
DSSD dssd/dssd.ipynb
RetinaNet retinanet/retinanet.ipynb
MaskRCNN mask_rcnn/mask_rcnn.ipynb
UNET unet/unet_isbi.ipynb
classification classification/classification.ipynb

Using TAO Pre-trained Models


TAO Toolkit getting Started License for TAO containers is included within the container at workspace/EULA.pdf. License for the pre-trained models are available with the model files. By pulling and using the Train Adapt Optimize (TAO) Toolkit container to download models, you accept the terms and conditions of these licenses.

Technical blogs

Suggested reading

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.