# TAO Pretrained Commercial Backbone for DINO ## What is Train Adapt Optimize (TAO) Toolkit? [Train Adapt Optimize (TAO) Toolkit](https://developer.nvidia.com/tao-toolkit) is a Python-based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TAO adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune, and export highly optimized and accurate AI models for edge deployment. Pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection, and more. Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TAO [DeepStream SDK](https://developer.nvidia.com/deepstream-sdk) and [TensorRT](https://developer.nvidia.com/tensorrt). These models are suitable for object detection, classification, and segmentation. ## DINO Based Object Detection Object detection is a popular computer vision technique that can detect one or multiple objects in a frame. Object detection will recognize the individual objects in an image and places bounding boxes around the object. This model card contains pretrained weights that may be used as a starting point with the **DINO** object detection networks in Train Adapt Optimize (TAO) Toolkit to facilitate transfer learning. It is trained on the NVImageNet that is permitted for commercial uses. Following backbones are supported with DINO networks. Supported Backbone: - resnet_50 - gc_vit_xxtiny / gc_vit_xtiny / gc_vit_tiny / gc_vit_small / gc_vit_base / gc_vit_large / gc_vit_large_384 - fan_tiny / fan_small / fan_base ## Model Versions - **resnet50** - NVImageNet pre-trained ResNet-50 model for finetune. - **gcvit_xxtiny_nvimagenet** - NVImageNet pre-trained GCViT-xxTiny model for finetune. - **gcvit_xtiny_nvimagenet** - NVImageNet pre-trained GCViT-xTiny model for finetune. - **gcvit_tiny_nvimagenet** - NVImageNet pre-trained GCViT-Tiny model for finetune. - **gcvit_small_nvimagenet** - NVImageNet pre-trained GCViT-Small model for finetune. - **gcvit_base_nvimagenet** - NVImageNet pre-trained GCViT-Base model for finetune. - **fan_hybrid_tiny_nvimagenet** - ImageNet22k pre-trained FAN-Hybrid-Tiny model for finetune. (224 resolution) - **fan_small_hybrid_nvimagenet** - ImageNet22k pre-trained FAN-Hybrid-Small model for finetune. (224 resolution) - **fan_base_hybrid_nvimagenet** - ImageNet22K pre-trained FAN-Hybrid-Base model finetuned on ImageNet-1k. - **fan_large_hybrid_nvimagenet** - ImageNet22K pre-trained FAN-Hybrid-Base model for finetune. (224 resolution) ### Instructions to Use Pretrained Backbone Models with TAO To use these models as pretrained backbone weights for transfer learning, use the snippet below as a template for the `model` and `train` component of the experiment spec file to train a DINO model. For more information on the experiment spec file, please refer to the [TAO Toolkit User Guide](https://docs.nvidia.com/tao/tao-toolkit/index.html). ```yaml model: pretrained_backbone_path: /path/to/the/resnet50.pth backbone: resnet_50 train_backbone: True num_feature_levels: 4 dec_layers: 6 enc_layers: 6 num_queries: 900 dropout_ratio: 0.0 dim_feedforward: 2048 ``` ## Other TAO Pre-trained Models - Get [TAO Object Detection](https://ngc.nvidia.com/catalog/models/nvidia:tao:pretrained_object_detection) pre-trained models for **YOLOV4, YOLOV3, FasterRCNN, SSD, DSSD, and RetinaNet** architectures from NGC model registry - Get [TAO DetectNet_v2 Object Detection](https://ngc.nvidia.com/catalog/models/nvidia:tao:pretrained_detectnet_v2) pre-trained models for **DetectNet_v2** architecture from NGC model registry - Get [TAO EfficientDet Object Detection](https://ngc.nvidia.com/catalog/models/nvidia:tao:pretrained_efficientdet) pre-trained models for **DetectNet_v2** architecture from NGC model registry - Get [TAO Instance segmentation](https://ngc.nvidia.com/catalog/models/nvidia:tao:pretrained_instance_segmentation) pre-trained models for **MaskRCNN** architecture from NGC - Get [TAO Semantic segmentation](https://ngc.nvidia.com/catalog/models/nvidia:tao:pretrained_semantic_segmentation) pre-trained models for **UNet** architecture from NGC - Get Purpose-built models from NGC model registry: - [PeopleNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplenet) - [TrafficCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:trafficcamnet) - [DashCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:dashcamnet) - [FaceDetectIR](https://ngc.nvidia.com/catalog/models/nvidia:tao:facedetectir) - [VehicleMakeNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:vehiclemakenet) - [VehicleTypeNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:vehicletypenet) - [PeopleSegNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplesegnet) - [PeopleSemSegNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplesemsegnet) - [License Plate Detection](https://ngc.nvidia.com/catalog/models/nvidia:tao:lpdnet) - [License Plate Recognition](https://ngc.nvidia.com/catalog/models/nvidia:tao:lprnet) - [Gaze Estimation](https://ngc.nvidia.com/catalog/models/nvidia:tao:gazenet) - [Facial Landmark](https://ngc.nvidia.com/catalog/models/nvidia:tao:fpenet) - [Heart Rate Estimation](https://ngc.nvidia.com/catalog/models/nvidia:tao:heartratenet) - [Gesture Recognition](https://ngc.nvidia.com/catalog/models/nvidia:tao:gesturenet) - [Emotion Recognition](https://ngc.nvidia.com/catalog/models/nvidia:tao:emotionnet) - [FaceDetect](https://ngc.nvidia.com/catalog/models/nvidia:facenet) - [2D Body Pose Net](https://ngc.nvidia.com/catalog/models/nvidia:tao:bodyposenet) - [ActionRecognitionNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:actionrecognitionnet) ## License The licenses to use this model is covered by the Model EULA. By downloading the unpruned or pruned version of the model, you accept the terms and conditions of these [licenses](https://www.nvidia.com/en-us/data-center/products/nvidia-ai-enterprise/eula/) ## Technical blogs - [Access the latest in Vision AI model development workflows with NVIDIA TAO Toolkit 5.0](https://developer.nvidia.com/blog/access-the-latest-in-vision-ai-model-development-workflows-with-nvidia-tao-toolkit-5-0/) - [Improve accuracy and robustness of vision AI apps with Vision Transformers and NVIDIA TAO](https://developer.nvidia.com/blog/improve-accuracy-and-robustness-of-vision-ai-apps-with-vision-transformers-and-nvidia-tao/) - [Train like a ‘pro’ without being an AI expert using TAO AutoML](https://developer.nvidia.com/blog/training-like-an-ai-pro-using-tao-automl/) - [Create Custom AI models using NVIDIA TAO Toolkit with Azure Machine Learning ](https://developer.nvidia.com/blog/creating-custom-ai-models-using-nvidia-tao-toolkit-with-azure-machine-learning/) - [Developing and Deploying AI-powered Robots with NVIDIA Isaac Sim and NVIDIA TAO](https://developer.nvidia.com/blog/developing-and-deploying-ai-powered-robots-with-nvidia-isaac-sim-and-nvidia-tao/) - Learn endless ways to adapt and supercharge your AI workflows with TAO - [Whitepaper](https://developer.nvidia.com/tao-toolkit-usecases-whitepaper/1-introduction) - [Customize Action Recognition with TAO and deploy with DeepStream](https://developer.nvidia.com/blog/developing-and-deploying-your-custom-action-recognition-application-without-any-ai-expertise-using-tao-and-deepstream/) - Read the 2 part blog on training and optimizing 2D body pose estimation model with TAO - [Part 1](https://developer.nvidia.com/blog/training-optimizing-2d-pose-estimation-model-with-tao-toolkit-part-1) | [Part 2](https://developer.nvidia.com/blog/training-optimizing-2d-pose-estimation-model-with-tao-toolkit-part-2) - Learn how to train [real-time License plate detection and recognition app](https://developer.nvidia.com/blog/creating-a-real-time-license-plate-detection-and-recognition-app) with TAO and DeepStream. - Model accuracy is extremely important, learn how you can achieve [state of the art accuracy for classification and object detection models](https://developer.nvidia.com/blog/preparing-state-of-the-art-models-for-classification-and-object-detection-with-tao-toolkit/) using TAO ## Suggested reading - More information on about TAO Toolkit and pre-trained models can be found at the [NVIDIA Developer Zone](https://developer.nvidia.com/tao-toolkit) - Read the [TAO getting Started](hhttps://docs.nvidia.com/tao/tao-toolkit/index.html) guide and [release notes](https://docs.nvidia.com/tao/tao-toolkit/text/release_notes.html). - If you have any questions or feedback, please refer to the discussions on [TAO Toolkit Developer Forums](https://forums.developer.nvidia.com/c/accelerated-computing/intelligent-video-analytics/tao-toolkit/17) - Deploy your model on the edge using DeepStream. Learn more about [DeepStream SDK](https://developer.nvidia.com/deepstream-sdk) ## Ethical AI NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.