TAO Toolkit for Computer Vision

NGC Catalog

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Features

Description

Docker container with workflows implemented in TensorFlow as part of the Train Adapt Optimize (TAO) Toolkit.

Publisher

NVIDIA

Latest Tag

v3.22.05-beta-api

Modified

July 25, 2023

Compressed Size

129.09 MB

Multinode Support

Multi-Arch Support

v3.22.05-beta-api (Latest) Security Scan Results

Linux / amd64

What is Train Adapt Optimize (TAO) Toolkit?

Train Adapt Optimize (TAO) Toolkit is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TAO adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.

The pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more.

Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TAO, DeepStream SDK and TensorRT. TAO can train models for common vision AI tasks such as object detection, classification, instance segmentation as well as other complex tasks such as pose estimation, facial landmark, gaze estimation, heart rate estimation and others.

Purpose-built Pre-Trained Models

Purpose-built pre-trained models offer highly accurate AI for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.

PeopleNet

2D Body Pose Estimation

Facial Landmark Estimation

The purpose-built models are available on NGC. Under each model cards, there is a pruned version that can be deployed as is or an unpruned version which can be used with TAO to fine tune with your own dataset.

Model Name	Network Architecture	Number of classes	Accuracy	Use Case
TrafficCamNet	DetectNet_v2-ResNet18	4	83.5% mAP	Detect and track cars
PeopleNet	DetectNet_v2-ResNet18	3	80% mAP	People counting, heatmap generation, social distancing
PeopleNet	DetectNet_v2-ResNet34	3	84% mAP	People counting, heatmap generation, social distancing
DashCamNet	DetectNet_v2-ResNet18	4	80% mAP	Identify objects from a moving object
FaceDetectIR	DetectNet_v2-ResNet18	1	96% mAP	Detect face in a dark environment with IR camera
VehicleMakeNet	ResNet18	20	91% mAP	Classifying car models
VehicleTypeNet	ResNet18	6	96% mAP	Classifying type of cars as coupe, sedan, truck, etc
PeopleSegNet	MaskRCNN-ResNet50	1	85% mAP	Creates segmentation masks around people, provides pixel
PeopleSemSegNet	UNET	1	92% MIOU	Creates semantic segmentation masks around people. Filters person from the background
License Plate Detection	DetectNet_v2-ResNet18	1	98% mAP	Detecting and localizing License plates on vehicles
License Plate Recognition	Tuned ResNet18	36(US) / 68(CH)	97%(US)/99%(CH)	Recognize License plates numbers
Gaze Estimation	Four branch AlexNet based model	N/A	6.5 RMSE	Detects person's eye gaze
Facial Landmark	Recombinator networks	N/A	6.1 pixel error	Estimates key points on person's face
Heart Rate Estimation	Two branch model with attention	N/A	0.7 BPM	Estimates person's heartrate from RGB video
Gesture Recognition	ResNet18	6	0.85 F1 score	Recognize hand gestures
Emotion Recognition	5 Fully Connected Layers	6	0.91 F1 score	Recognize facial Emotion
FaceDetect	DetectNet_v2-ResNet18	1	85.3 mAP	Detect faces from RGB or grayscale image
2D Body Pose Estimation	Single shot bottom-up	18	-	Estimates key joints on person's body
ActionRecognitionNet	2D RGB-only Resnet18	5	82.88	Recognizes action of a person from a sequence of images
ActionRecognitionNet	3D RGB-only Resnet18	5	85.59	Recognizes action of a person from a sequence of images
PoseClassificationNet	ST-GCN	6	89.53	Recognizes action of a person from a sequence of skeletons

Architecture specific pre-trained models

In addition to purpose-built models, TAO Toolkit supports the following detection architectures:

These detection meta-architectures can be used with 13 backbones or feature extractors with TAO. For a complete list of all the permutations that are supported by TAO, please see the matrix below:

TAO Toolkit supports instance segmentation using MaskRCNN architecture.

TAO Toolkit supports semantic segmentation using UNET architecture.

Training

To get started, first choose the model architecture that you want to build, then select the appropriate model card on NGC and then choose one of the supported backbones.

LOGO

Running TAO Toolkit

Setup your python environment using python virtualenv and virtualenvwrapper.
In TAO Toolkit, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tao-launcher will handle that. You may install the launcher using pip with the following commands.

pip3 install nvidia-tao

Download the Jupyter notebooks that you are interested in from NGC resources. After installing the pre-requisite, all the training steps will be run from inside the Jupyter notebook.

Purpose-built Model	Jupyter notebook
PeopleNet	detectnet_v2/detectnet_v2.ipynb
TrafficCamNet	detectnet_v2/detectnet_v2.ipynb
DashCamNet	detectnet_v2/detectnet_v2.ipynb
FaceDetectIR	detectnet_v2/detectnet_v2.ipynb
VehicleMakeNet	classification/classification.ipynb
VehicleTypeNet	classification/classification.ipynb
PeopleSegNet	mask_rcnn/mask_rcnn.ipynb
License Plate Detection	detectnet_v2/detectnet_v2.ipynb
License Plate Recognition	lprnet/lprnet.ipynb
Gaze Estimation	gazenet/gazenet.ipynb
Facial Landmark	fpenet/fpenet.ipynb
Heart Rate Estimation	heartratenet/heartratenet.ipynb
Gesture Recognition	gesturenet/gesturenet.ipynb
Emotion Recognition	emotionnet/emotionnet.ipynb
FaceDetect	facenet/facenet.ipynb
2D Body Pose Net	bpnet/bpnet.ipynb
PeopleSemSegNet	unet/unet_isbi.ipynb
ActionRecognitionNet	action_recognition_net/actionrecognitionnet.ipynb
PoseClassificationNet	pose_classification_net/poseclassificationnet.ipynb
PointPillars	pointpillars/pointpillars.md

Open model architecture	Jupyter notebook
DetectNet_v2	detectnet_v2/detectnet_v2.ipynb
EfficientDet	efficientdet/efficientdet.ipynb
FasterRCNN	faster_rcnn/faster_rcnn.ipynb
YOLOV3	yolo_v3/yolo_v3.ipynb
YOLOV4	yolo_v4/yolo_v4.ipynb
YOLOV4-Tiny	yolo_v4_tiny/yolo_v4_tiny.ipynb
SSD	ssd/ssd.ipynb
DSSD	dssd/dssd.ipynb
RetinaNet	retinanet/retinanet.ipynb
MaskRCNN	mask_rcnn/mask_rcnn.ipynb
UNET	unet/unet_isbi.ipynb
classification	classification/classification.ipynb

Using TAO Pre-trained Models

Get TAO Object Detection pre-trained models for YOLOV4, YOLOV3, FasterRCNN, SSD, DSSD, and RetinaNet architectures from NGC model registry
Get TAO DetectNet_v2 Object Detection pre-trained models for DetectNet_v2 architecture from NGC model registry
Get TAO EfficientDet Object Detection pre-trained models for DetectNet_v2 architecture from NGC model registry
Get TAO classification pre-trained models from NGC model registry
Get TAO Instance segmentation pre-trained models for MaskRCNN architecture from NGC
Get Purpose-built models from NGC model registry:

License

TAO Toolkit getting Started License for TAO containers is included within the container at workspace/EULA.pdf. License for the pre-trained models are available with the model files. By pulling and using the Train Adapt Optimize (TAO) Toolkit container to download models, you accept the terms and conditions of these licenses.

Technical blogs

Read the 2 part blog on training and optimizing 2D body pose estimation model with TAO - Part 1 | Part 2
Learn how to train real-time License plate detection and recognition app with TAO and DeepStream.
Model accuracy is extremely important, learn how you can achieve state of the art accuracy for classification and object detection models using TAO
Learn how to train Instance segmentation model using MaskRCNN with TAO
Learn how to improve INT8 accuracy using Quantization aware training(QAT) with TAO
Read the technical tutorial on how PeopleNet model can be trained with custom data using Transfer Learning Toolkit
Learn how to train and deploy real-time intelligent video analytics apps and services using DeepStream SDK

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.