TAO (Train Adapt Optimize) Toolkit is a python based AI toolkit that's built on TensorFlow and PyTorch. It provides transfer learning capability to adapt popular neural network architectures and backbones to your data, allowing you to train, fine-tune, prune, quantize and export highly optimized and accurate AI models for edge deployment.
The purpose built pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more. TAO supports training for CV, 3D Point cloud, ASR, NLP and TTS modalities.
TAO Toolkit packages a collection of containers, python wheels, models and helm chart. AI training tasks run either on TensorFlow or PyTorch depending upon the entrypoint for the model.
For deployment, TAO models can be deployed to DeepStream for video analytics applications, Riva for Conversational AI applications or Triton for inference serving use cases.
All containers needed to run TAO can be pulled from this location. See the list below for all available containers in this registry.
|TAO Container Type||container_name:tag||What's it used for?|
|TAO TensorFlow v1 container||nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf1.15.5||Older CV networks like YOLOs, FasterRCNN, DetectNet_v2, MaskRCNN, UNET and more|
|TAO TensorFlow v2 container||nvcr.io/nvidia/tao/tao-toolkit:4.0.0-tf2.9.1||CV networks like EfficientDet, EfficientNet and more|
|TAO PyTorch container||nvcr.io/nvidia/tao/tao-toolkit:4.0.0-pyt||Newer CV networks like Deformable-DETR, SegFormer and more as well as all ConvAI networks|
|TAO Deploy container||nvcr.io/nvidia/tao/tao-toolkit:4.0.0-deploy||Container used to TensorRT engine, INT8 calibration from a trained TAO model and evaluation on said TensorRT engine|
|TAO API container||nvcr.io/nvidia/tao/tao-toolkit:4.0.0-api||Front-end services container that can be used to host a TAO REST API server for remote execution of model training tasks. Useful for building higher level services|
There are 4 ways to run TAO depending on user preference and their setup. See the full list below.
More information about each of the different ways and the Jupyter notebooks can be found in the TAO Getting Started Resources
The TAO Launcher is a lightweight Python based CLI to run TAO. The launcher basically acts as a front-end for the multiple TAO Toolkit containers built on both PyTorch and Tensorflow. The multiple containers essentially get launched automatically based on the type of model you plan to use for your computer vision or conversational AI use-cases.
To get started, use the startup script and instructions from TAO Getting Started Resources
Users have option to also run TAO directly using the docker container. To use container directly, user needs to know which container to pull. There are multiple containers under TAO, and depending on the model that you want to train you will need to pull the appropriate container. This is not required when using the Launcher CLI.
export DOCKER_REGISTRY="nvcr.io" export DOCKER_NAME="nvidia/tao/tao-toolkit" export DOCKER_TAG="***" ## for TensorFlow/PyTorch or Deploy container export DOCKER_CONTAINER=$DOCKER_REGISTRY/$DOCKER_NAME:$DOCKER_TAG docker run -it --rm --gpus all -v /path/in/host:/path/in/docker $DOCKER_CONTAINER \ detectnet_v2 train -e /path/to/experiment/spec.txt -r /path/to/results/dir -k $KEY --gpus 4
More information about running directly from docker is provided in TAO documentation - Container
TAO Toolkit API is a Kubernetes service that enables building end-to-end AI models using REST APIs. The API service can be installed on a Kubernetes cluster (local / AWS EKS) using a Helm chart along with minimal dependencies. TAO toolkit jobs can be run using GPUs available on the cluster and can scale to a multi-node setting. Users can use a TAO client CLI to interact with TAO services remotely or can integrate it in their own apps and services directly using REST APIs.
To get started, use the one-click deploy script and instructions from TAO Getting Started Resources
Users can also run TAO directly on bare-metal without docker or K8s. Users can deploy TAO notebooks directly on Google Colab without having to configure infrastructure. The full instructions are provided in the Colab notebook below.
|CV Task||Model Arch||One-click Deploy|
|Classification||ResNet18||Train on Colab|
|Multi-task Classification||ResNet18||Train on Colab|
|Object Detection||Deformable-DETR||Train on Colab|
|Object Detection||DSSD||Train on Colab|
|Object Detection||EfficientDet||Train on Colab|
|Object Detection||RetinaNet||Train on Colab|
|Object Detection||SSD||Train on Colab|
|Object Detection||YOLOv3||Train on Colab|
|Object Detection||YOLOv4||Train on Colab|
|Object Detection||YoloV4 Tiny||Train on Colab|
|Action Recognition||ActionRecognition||Train on Colab|
To run TAO API in a Kubernetes service, we have packaged Helm charts to help with managing and orchestrating various TAO services. The helm chart can be pulled from TAO API Helm on NGC.
If you use the one-click deploy script from #3 above, then Helm charts are automatically pulled from the script.
TAO offers several highly accurate purpose-built pre-trained models for a variety of vision AI and Conversatational AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.
2D Body Pose Estimation
The purpose-built models are available on NGC. Under each model cards, there is a pruned/deployable version that can be deployed as is or an unpruned/trainable version which can be used with TAO to fine tune with your own dataset.
|Model Name||Network Architecture||Number
|TrafficCamNet||DetectNet_v2-ResNet18||4||83.5% mAP||Detect and track cars|
|PeopleNet||DetectNet_v2-ResNet18||3||80% mAP||People counting, heatmap generation, social distancing|
|PeopleNet||DetectNet_v2-ResNet34||3||84% mAP||People counting, heatmap generation, social distancing|
|PeopleNet-Transformer||Deformable-DETR||3||84% mAP||Vision Transformer model for People detection for counting, heatmap generation, social distancing|
|DashCamNet||DetectNet_v2-ResNet18||4||80% mAP||Identify objects from a moving object|
|FaceDetectIR||DetectNet_v2-ResNet18||1||96% mAP||Detect face in a dark environment with IR camera|
|VehicleMakeNet||ResNet18||20||91% mAP||Classifying car models|
|VehicleTypeNet||ResNet18||6||96% mAP||Classifying type of cars as coupe, sedan, truck, etc|
|PeopleSegNet||MaskRCNN-ResNet50||1||85% mAP||Creates segmentation masks around people, provides pixel|
|PeopleSemSegNet||UNET||1||92% MIOU||Creates semantic segmentation masks around people. Filters person from the background|
|License Plate Detection||DetectNet_v2-ResNet18||1||98% mAP||Detecting and localizing License plates on vehicles|
|License Plate Recognition||Tuned ResNet18||36(US) / 68(CH)||97%(US)/99%(CH)||Recognize License plates numbers|
|Gaze Estimation||Four branch AlexNet based model||N/A||6.5 RMSE||Detects person's eye gaze|
|Facial Landmark||Recombinator networks||N/A||6.1 pixel error||Estimates key points on person's face|
|Heart Rate Estimation||Two branch model with attention||N/A||0.7 BPM||Estimates person's heartrate from RGB video|
|Gesture Recognition||ResNet18||6||0.85 F1 score||Recognize hand gestures|
|Emotion Recognition||5 Fully Connected Layers||6||0.91 F1 score||Recognize facial Emotion|
|FaceDetect||DetectNet_v2-ResNet18||1||85.3 mAP||Detect faces from RGB or grayscale image|
|2D Body Pose Estimation||Single shot bottom-up||18||-||Estimates key joints on person's body|
|ActionRecognitionNet||2D RGB-only Resnet18||5||82.88||Recognizes action of a person from a sequence of images|
|ActionRecognitionNet||3D RGB-only Resnet18||5||85.59||Recognizes action of a person from a sequence of images|
|PoseClassificationNet||ST-GCN||6||89.53||Recognizes action of a person from a sequence of skeletons|
|People ReIdentification||ResNet50||N/A||93.0% mAP / rank-1 accuracy: 94.7%||Produces embeddings for objects and produces sampled matches|
|PointPillarNet||PointPillar||3||67% mAP||3D point cloud model for Lidar sensor|
|CitySemSegFormer||SegFormer||20||85% mIoU Top-5||Transformer based semantic segmentation model for Smart city use cases|
|Retail Object Detection||EfficientDet||100||80%||Automated self checkout and inventory management in retail|
|Retail Object Recognition||ResNet101||N/A||84%||Automated self checkout and inventory management in retail|
In addition to purpose-built models, TAO Toolkit supports the following detection architectures:
These detection meta-architectures can be used with 15+ backbones or feature extractors with TAO. For a complete list of all the permutations that are supported by TAO, please see the matrix below:
TAO Toolkit supports instance segmentation using MaskRCNN architecture.
TAO Toolkit supports semantic segmentation using UNET and SegFormer architecture.
This table shows all the permutation and combinations of model architecture and backbones that are supported in TAO.
|Model||Accuracy Metric(s)||Use Case|
|ASR: Jasper (English)||3.74%/10.21% WER (LibriSpeech dev-clean/dev-other)||English speech recognition|
|ASR: QuartzNet (English)||4.38%/11.30% WER (LibriSpeech dev-clean/dev-other)||English speech recognition (smaller model)|
|ASR: Citrinet (English)||English speech recognition|
|ASR: Conformer (English)||English speech recognition|
|QA: Bert Base (SQuAD2.0)||73.35% EM score, 76.44 F1 score||Question answering|
|QA: Bert Large (SQuAD2.0)||77.16% EM score, 80.22% F1 score||Question answering|
|QA: Bert Megatron (SQuAD2.0)||78.0% EM score, 81.35% F1 score||Question answering|
|Domain Classification: BERT||90% accuracy for 4 domains of the weather chatbot||Text classification problems (e.g. sentiment analysis, domain detection)|
|Punctuation and Capitalization: BERT||77% F1 score||Punctuation and capitalization of ASR output|
|NER: BERT||74.21% F1 score||Named entity recognition and other token-level classification tasks|
|Joint Intent and Slot Classification: BERT||95% intent accuracy, 93% slot accuracy||Classifying intent and detecting relevant slots in a query|
TAO Toolkit getting Started
License for TAO containers is included within the container at
workspace/EULA.pdf. License for the pre-trained models are available with the model files. By pulling and using the Train Adapt Optimize (TAO) Toolkit container to download models, you accept the terms and conditions of these licenses.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.