TAO Toolkit - Computer Vision

TAO Toolkit - Computer Vision

Logo for TAO Toolkit - Computer Vision
TAO Toolkit is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data.
March 16, 2024
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.

What is TAO Toolkit?

TAO (Train Adapt Optimize) Toolkit is a python based AI toolkit that's built on TensorFlow and PyTorch. It provides transfer learning capability to adapt popular neural network architectures and backbones to your data, allowing you to train, fine-tune, prune, quantize and export highly optimized and accurate AI models for edge deployment.

The purpose built pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more. TAO supports training for CV, 3D Point cloud, ASR, NLP and TTS modalities.

TAO Toolkit packages a collection of containers, python wheels, models and helm chart. AI training tasks run either on TensorFlow or PyTorch depending upon the entrypoint for the model.

For deployment, TAO models can be deployed to DeepStream for video analytics applications, Riva for Conversational AI applications or Triton for inference serving use cases.

TAO Containers

All containers needed to run TAO can be pulled from this location. See the list below for all available containers in this registry.

TAO Container Type container_name:tag What's it used for?
TAO TensorFlow v1 container nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5 Older CV networks like YOLOs, FasterRCNN, DetectNet_v2, MaskRCNN, UNET and more
TAO TensorFlow v2 container nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf2.11.0 CV networks like EfficientDet, EfficientNet and more
TAO PyTorch container nvcr.io/nvidia/tao/tao-toolkit:5.3.0-pyt Newer CV networks like Deformable-DETR, SegFormer and more as well as all ConvAI networks
TAO Deploy container nvcr.io/nvidia/tao/tao-toolkit:5.3.0-deploy Container used to TensorRT engine, INT8 calibration from a trained TAO model and evaluation on said TensorRT engine
TAO Data Service Container for AI-assisted annotation and few other data services nvcr.io/nvidia/tao/tao-toolkit:5.3.0-data-services
TAO API container nvcr.io/nvidia/tao/tao-toolkit:5.3.0-api Front-end services container that can be used to host a TAO REST API server for remote execution of model training tasks. Useful for building higher level services

How to run TAO?

There are 4 ways to run TAO depending on user preference and their setup. See the full list below.
More information about each of the different ways and the Jupyter notebooks can be found in the TAO Getting Started Resources

1. Launcher CLI

The TAO Launcher is a lightweight Python based CLI to run TAO. The launcher basically acts as a front-end for the multiple TAO Toolkit containers built on both PyTorch and Tensorflow. The multiple containers essentially get launched automatically based on the type of model you plan to use for your computer vision or conversational AI use-cases.

To get started, use the startup script and instructions from TAO Getting Started Resources

2. Directly from Container

Users have option to also run TAO directly using the docker container. To use container directly, user needs to know which container to pull. There are multiple containers under TAO, and depending on the model that you want to train you will need to pull the appropriate container. This is not required when using the Launcher CLI.

export DOCKER_REGISTRY="nvcr.io"
export DOCKER_NAME="nvidia/tao/tao-toolkit"
export DOCKER_TAG="***" ## for TensorFlow/PyTorch or Deploy container

docker run -it --rm --gpus all -v /path/in/host:/path/in/docker $DOCKER_CONTAINER \
detectnet_v2 train -e /path/to/experiment/spec.txt -r /path/to/results/dir -k $KEY --gpus 4

More information about running directly from docker is provided in TAO documentation - Container


TAO Toolkit API is a Kubernetes service that enables building end-to-end AI models using REST APIs. The API service can be installed on a Kubernetes cluster (local / AWS EKS) using a Helm chart along with minimal dependencies. TAO toolkit jobs can be run using GPUs available on the cluster and can scale to a multi-node setting. Users can use a TAO client CLI to interact with TAO services remotely or can integrate it in their own apps and services directly using REST APIs.

To get started, use the one-click deploy script and instructions from TAO Getting Started Resources

4. Python Wheel

Users can also run TAO directly on bare-metal without docker or K8s. Users can deploy TAO notebooks directly on Google Colab without having to configure infrastructure. The full instructions are provided in the Colab notebook below.

CV Task Model Arch One-click Deploy
Classification ResNet18 Train on Colab
Multi-task Classification ResNet18 Train on Colab
Object Detection Deformable-DETR Train on Colab
Object Detection DSSD Train on Colab
Object Detection EfficientDet Train on Colab
Object Detection RetinaNet Train on Colab
Object Detection SSD Train on Colab
Object Detection YOLOv3 Train on Colab
Object Detection YOLOv4 Train on Colab
Object Detection YoloV4 Tiny Train on Colab
Action Recognition ActionRecognition Train on Colab

TAO Helm Chart

To run TAO API in a Kubernetes service, we have packaged Helm charts to help with managing and orchestrating various TAO services. The helm chart can be pulled from TAO API Helm on NGC.

If you use the one-click deploy script from #3 above, then Helm charts are automatically pulled from the script.

Pre-Trained Models

TAO offers several highly accurate purpose-built pre-trained models for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.

2D Body Pose Estimation

CV Pre-trained Models

The purpose-built models are available on NGC. Under each model cards, there is a pruned/deployable version that can be deployed as is or an unpruned/trainable version which can be used with TAO to fine tune with your own dataset.

Model Name Network Architecture Number
of classes
Accuracy Use Case
TrafficCamNet DetectNet_v2-ResNet18 4 83.5% mAP Detect and track cars
PeopleNet DetectNet_v2-ResNet18 3 80% mAP People counting, heatmap generation, social distancing
PeopleNet DetectNet_v2-ResNet34 3 84% mAP People counting, heatmap generation, social distancing
PeopleNet-Transformer Deformable-DETR 3 84% mAP Vision Transformer model for People detection for counting, heatmap generation, social distancing
DashCamNet DetectNet_v2-ResNet18 4 80% mAP Identify objects from a moving object
FaceDetectIR DetectNet_v2-ResNet18 1 96% mAP Detect face in a dark environment with IR camera
VehicleMakeNet ResNet18 20 91% mAP Classifying car models
VehicleTypeNet ResNet18 6 96% mAP Classifying type of cars as coupe, sedan, truck, etc
PeopleSegNet MaskRCNN-ResNet50 1 85% mAP Creates segmentation masks around people, provides pixel
PeopleSemSegNet UNET 1 92% MIOU Creates semantic segmentation masks around people. Filters person from the background
License Plate Detection DetectNet_v2-ResNet18 1 98% mAP Detecting and localizing License plates on vehicles
License Plate Recognition Tuned ResNet18 36(US) / 68(CH) 97%(US)/99%(CH) Recognize License plates numbers
Gaze Estimation Four branch AlexNet based model N/A 6.5 RMSE Detects person's eye gaze
Facial Landmark Recombinator networks N/A 6.1 pixel error Estimates key points on person's face
Heart Rate Estimation Two branch model with attention N/A 0.7 BPM Estimates person's heartrate from RGB video
Gesture Recognition ResNet18 6 0.85 F1 score Recognize hand gestures
Emotion Recognition 5 Fully Connected Layers 6 0.91 F1 score Recognize facial Emotion
FaceDetect DetectNet_v2-ResNet18 1 85.3 mAP Detect faces from RGB or grayscale image
2D Body Pose Estimation Single shot bottom-up 18 - Estimates key joints on person's body
ActionRecognitionNet 2D RGB-only Resnet18 5 82.88 Recognizes action of a person from a sequence of images
ActionRecognitionNet 3D RGB-only Resnet18 5 85.59 Recognizes action of a person from a sequence of images
PoseClassificationNet ST-GCN 6 89.53 Recognizes action of a person from a sequence of skeletons
People ReIdentification ResNet50 N/A 93.0% mAP / rank-1 accuracy: 94.7% Produces embeddings for objects and produces sampled matches
PointPillarNet PointPillar 3 67% mAP 3D point cloud model for Lidar sensor
CitySegFormer SegFormer 20 85% mIoU Top-5 Transformer based semantic segmentation model for Smart city use cases
Retail Object Detection EfficientDet 100 80% Automated self checkout and inventory management in retail
Retail Object Embedding ResNet101 N/A 84% Automated self checkout and inventory management in retail
Optical Inspection Custom Siamese 97% Inspect PCB component images to detect defects in assembly
Optical Character Detection DBNet 82.2% Detect Characters in an image of a document
Optical Character Recognition ResNet50 + TPS module 92.6% Recognize characters from the grayscale images
PCB Classification GCViT-xx-Tiny 92.6% Model to classify defects in soldered components on a Printed Circuit Board
PeopleSemSegFormer FAN Base Hybrid 92.6% Model to segment persons in an image
Architecture specific CV pre-trained models

In addition to purpose-built models, TAO Toolkit supports the following detection architectures:

These detection meta-architectures can be used with 15+ backbones or feature extractors with TAO. For a complete list of all the permutations that are supported by TAO, please see the matrix below:

TAO Toolkit supports instance segmentation using MaskRCNN architecture.
TAO Toolkit supports semantic segmentation using UNET and SegFormer architecture.

This table shows all the permutation and combinations of model architecture and backbones that are supported in TAO.



TAO Toolkit getting Started License for TAO containers is included in the banner of the container. License for the pre-trained models are available with the model cards on NGC. By pulling and using the Train Adapt Optimize (TAO) Toolkit container to download models, you accept the terms and conditions of these licenses.

Technical blogs

Suggested reading

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.