Linux / amd64
Train Adapt Optimize (TAO) Toolkit is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TAO adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.
The pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more.
Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TAO, DeepStream SDK and TensorRT. TAO can train models for common vision AI tasks such as object detection, classification, instance segmentation as well as other complex tasks such as pose estimation, facial landmark, gaze estimation, heart rate estimation and others.
Purpose-built pre-trained models offer highly accurate AI for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.
PeopleNet
2D Body Pose Estimation
Facial Landmark Estimation
The purpose-built models are available on NGC. Under each model cards, there is a pruned version that can be deployed as is or an unpruned version which can be used with TAO to fine tune with your own dataset.
Model Name | Network Architecture | Number of classes |
Accuracy | Use Case |
---|---|---|---|---|
TrafficCamNet | DetectNet_v2-ResNet18 | 4 | 83.5% mAP | Detect and track cars |
PeopleNet | DetectNet_v2-ResNet18 | 3 | 80% mAP | People counting, heatmap generation, social distancing |
PeopleNet | DetectNet_v2-ResNet34 | 3 | 84% mAP | People counting, heatmap generation, social distancing |
DashCamNet | DetectNet_v2-ResNet18 | 4 | 80% mAP | Identify objects from a moving object |
FaceDetectIR | DetectNet_v2-ResNet18 | 1 | 96% mAP | Detect face in a dark environment with IR camera |
VehicleMakeNet | ResNet18 | 20 | 91% mAP | Classifying car models |
VehicleTypeNet | ResNet18 | 6 | 96% mAP | Classifying type of cars as coupe, sedan, truck, etc |
PeopleSegNet | MaskRCNN-ResNet50 | 1 | 85% mAP | Creates segmentation masks around people, provides pixel |
PeopleSemSegNet | UNET | 1 | 92% MIOU | Creates semantic segmentation masks around people. Filters person from the background |
License Plate Detection | DetectNet_v2-ResNet18 | 1 | 98% mAP | Detecting and localizing License plates on vehicles |
License Plate Recognition | Tuned ResNet18 | 36(US) / 68(CH) | 97%(US)/99%(CH) | Recognize License plates numbers |
Gaze Estimation | Four branch AlexNet based model | N/A | 6.5 RMSE | Detects person's eye gaze |
Facial Landmark | Recombinator networks | N/A | 6.1 pixel error | Estimates key points on person's face |
Heart Rate Estimation | Two branch model with attention | N/A | 0.7 BPM | Estimates person's heartrate from RGB video |
Gesture Recognition | ResNet18 | 6 | 0.85 F1 score | Recognize hand gestures |
Emotion Recognition | 5 Fully Connected Layers | 6 | 0.91 F1 score | Recognize facial Emotion |
FaceDetect | DetectNet_v2-ResNet18 | 1 | 85.3 mAP | Detect faces from RGB or grayscale image |
2D Body Pose Estimation | Single shot bottom-up | 18 | - | Estimates key joints on person's body |
ActionRecognitionNet | 2D RGB-only Resnet18 | 5 | 82.88 | Recognizes action of a person from a sequence of images |
ActionRecognitionNet | 3D RGB-only Resnet18 | 5 | 85.59 | Recognizes action of a person from a sequence of images |
PoseClassificationNet | ST-GCN | 6 | 89.53 | Recognizes action of a person from a sequence of skeletons |
In addition to purpose-built models, TAO Toolkit supports the following detection architectures:
These detection meta-architectures can be used with 13 backbones or feature extractors with TAO. For a complete list of all the permutations that are supported by TAO, please see the matrix below:
TAO Toolkit supports instance segmentation using MaskRCNN architecture.
TAO Toolkit supports semantic segmentation using UNET architecture.
To get started, first choose the model architecture that you want to build, then select the appropriate model card on NGC and then choose one of the supported backbones.
Setup your python environment using python virtualenv
and virtualenvwrapper
.
In TAO Toolkit, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tao-launcher will handle that. You may install the launcher using pip with the following commands.
pip3 install nvidia-tao
Purpose-built Model | Jupyter notebook |
---|---|
PeopleNet | detectnet_v2/detectnet_v2.ipynb |
TrafficCamNet | detectnet_v2/detectnet_v2.ipynb |
DashCamNet | detectnet_v2/detectnet_v2.ipynb |
FaceDetectIR | detectnet_v2/detectnet_v2.ipynb |
VehicleMakeNet | classification/classification.ipynb |
VehicleTypeNet | classification/classification.ipynb |
PeopleSegNet | mask_rcnn/mask_rcnn.ipynb |
License Plate Detection | detectnet_v2/detectnet_v2.ipynb |
License Plate Recognition | lprnet/lprnet.ipynb |
Gaze Estimation | gazenet/gazenet.ipynb |
Facial Landmark | fpenet/fpenet.ipynb |
Heart Rate Estimation | heartratenet/heartratenet.ipynb |
Gesture Recognition | gesturenet/gesturenet.ipynb |
Emotion Recognition | emotionnet/emotionnet.ipynb |
FaceDetect | facenet/facenet.ipynb |
2D Body Pose Net | bpnet/bpnet.ipynb |
PeopleSemSegNet | unet/unet_isbi.ipynb |
ActionRecognitionNet | action_recognition_net/actionrecognitionnet.ipynb |
PoseClassificationNet | pose_classification_net/poseclassificationnet.ipynb |
PointPillars | pointpillars/pointpillars.md |
Open model architecture | Jupyter notebook |
---|---|
DetectNet_v2 | detectnet_v2/detectnet_v2.ipynb |
EfficientDet | efficientdet/efficientdet.ipynb |
FasterRCNN | faster_rcnn/faster_rcnn.ipynb |
YOLOV3 | yolo_v3/yolo_v3.ipynb |
YOLOV4 | yolo_v4/yolo_v4.ipynb |
YOLOV4-Tiny | yolo_v4_tiny/yolo_v4_tiny.ipynb |
SSD | ssd/ssd.ipynb |
DSSD | dssd/dssd.ipynb |
RetinaNet | retinanet/retinanet.ipynb |
MaskRCNN | mask_rcnn/mask_rcnn.ipynb |
UNET | unet/unet_isbi.ipynb |
classification | classification/classification.ipynb |
Get TAO Object Detection pre-trained models for YOLOV4, YOLOV3, FasterRCNN, SSD, DSSD, and RetinaNet architectures from NGC model registry
Get TAO DetectNet_v2 Object Detection pre-trained models for DetectNet_v2 architecture from NGC model registry
Get TAO EfficientDet Object Detection pre-trained models for DetectNet_v2 architecture from NGC model registry
Get TAO classification pre-trained models from NGC model registry
Get TAO Instance segmentation pre-trained models for MaskRCNN architecture from NGC
Get Purpose-built models from NGC model registry:
TAO Toolkit getting Started
License for TAO containers is included within the container at workspace/EULA.pdf
. License for the pre-trained models are available with the model files. By pulling and using the Train Adapt Optimize (TAO) Toolkit container to download models, you accept the terms and conditions of these licenses.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.