NVIDIA
NVIDIA
TAO Toolkit
Container
NVIDIA
NVIDIA
TAO Toolkit

Docker containers distributed as part of the TAO Toolkit package

TAO

TAO (Train Adapt Optimize) Toolkit is a python based AI toolkit that's built on TensorFlow 2.0 and PyTorch. It provides transfer learning capability to adapt popular neural network architectures and backbones to your data, allowing you to train, fine-tune, prune, quantize and export highly optimized and accurate AI models for edge deployment. You can use TAO Toolkit to distill knowledge from large foundation models like C-RADIOv2 and NVDINOv2 to smaller compute friendly models for the edge.

The purpose built pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more. TAO supports training for CV and 3D Point cloud modalities.

TAO packages a collection of containers, python wheels, models and helm chart. AI training tasks run either on TensorFlow or PyTorch depending upon the entrypoint for the model.

For deployment, TAO models can be deployed to DeepStream for video analytics applications, or Triton for inference serving use cases.

License

Use of this software is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Your use of models with this software is subject to the terms accompanying the models.

TAO 7.0.1

TAO 7.0.1 includes the following assets:

  1. 6 containers

  2. A getting started resource that contains tutorial notebooks

  3. NVIDIA TAO launcher wheel (hosted on PyPI)

  4. TAO Helm Chart

  5. C-RADIOv3 - B/L/H/G available on Hugging Face for generating rich visual embeddings

  6. ConvNext series of Foundation models

  7. Purpose built model - TrafficCamNet Transformer Lite for object detection in traffic scenes.

  8. State of the art commercial models for depth estimation from monocular images and stereo image pairs:

    1. Monocular depth estimation via NvDepthAnythingv2
    2. Stereo depth estimation via C-FoundationStereo
  9. Purpose built model

    1. Sparse4D for multi-camera 3D object detection and tracking.

    2. RT-DETR Warehouse 2D for 2D Object detection in warehouses.

    3. Multimodal embedding models for Metropolis Blueprints

      1. RADIO-CLIP
      2. SigLIPv2

TAO Containers

All containers needed to run TAO can be pulled from this location. See the list below for all available containers in this registry.

TAO Container Typecontainer_name:tagWhat's it used for?
TAO PyTorch containernvcr.io/nvidia/tao/tao-toolkit:7.0.1-pytFinetuning workflows in PyTorch
TAO Deploy containernvcr.io/nvidia/tao/tao-toolkit:7.0.1-deployTensorRT inference workflows in Deploy
TAO Data Services containernvcr.io/nvidia/tao/tao-toolkit:7.0.1-data-servicesDataset augmentation, autolabelling and analysis workflows
TAO Cosmos-RL containernvcr.io/nvidia/tao/tao-toolkit:7.0.1-cosmos-rlFinetuning workflows for the Cosmos-Reason VLMs
TAO Cosmos-Embed containernvcr.io/nvidia/tao/tao-toolkit:7.0.1-cosmos-embedFinetuning workflows for the Cosmos-Embed model
TAO Cosmos-Predict containernvcr.io/nvidia/tao/tao-toolkit:7.0.1-cosmos-predictFinetuning workflows for the Cosmos-Predict model

How to run TAO 7.0.1

To get started with TAO, use skills from NVIDIA-TAO/tao-skills-bank.

Pre-trained Models

The TAO 7.0.1 package refers several pre-trained models released as part of NGC.

Model NamePulledUse Case
CRADIOv2ManualMulti-teacher distilled foundation model generating rich visual embeddings
C-RADIOv3 - B/L/H/GManualEnhanced multi-teacher distilled foundation model for improved visual embeddings (available on Hugging Face)
ConvNextv2ManualFC-MAE trained foundation model generating rich visual embeddings for CNNs
TrafficCamNet Transformer LiteManualObject detection network for detecting 4 class objects in traffic scenes
NvDepthAnythingv2ManualDepth estimation model to generate relative depth maps from monocular images
C-FoundationStereoManualDepth estimation model to generate relative disparity maps from stereo image pairs
Sparse4DManualMulti-camera 3D object detection and tracking
RT-DETR Warehouse 2DManual2D object detection model for warehouse environments
RADIO-CLIPManualMultimodal embedding model for Metropolis Blueprints
SigLIPv2ManualMultimodal embedding model for Metropolis Blueprints

DeiT-base, EfficientNet, and DeiT-small are pulled automatically through the timm API in the training code. C-RADIOv2, C-RADIOv3, InceptionNet and EfficientNet-embeddings need to be pulled manually. CRADIOv3 can be accessed from Hugging Face. The tutorial notebooks outline the instructions and steps to pull these models.

Technical blogs

Suggested reading

Ethical Considerations

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models beg deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Security Vulnerabilities in Open Source Packages

Please review the Security Scanng (LINK) tab to view the latest security scan results.
For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning (LINK) tab.

Get Help

Get access to knowledge base articles and support cases or submit a ticket.