NGC | Catalog
CatalogCollectionsTLT - CV Training

TLT - CV Training

Logo for TLT - CV Training
Description
NVIDIA’s Transfer Learning Toolkit is a python-based AI training toolkit that allows developers to create accurate and efficient AI models for Intelligent Video Analytics and Computer Vision without expertise in AI frameworks
Curator
NVIDIA
Modified
April 4, 2023
Containers
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Models
Sorry, your browser does not support inline SVG.
Resources
Sorry, your browser does not support inline SVG.

What is Transfer Learning Toolkit?

Transfer Learning Toolkit (TLT) is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TLT adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.

The pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more.

Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TLT, DeepStream SDK and TensorRT. TLT can train models for common vision AI tasks such as object detection, classification, instance segmentation as well as other complex tasks such as pose estimation, facial landmark, gaze estimation, heart rate estimation and others.

Purpose-built Pre-Trained Models

Purpose-built pre-trained models offer highly accurate AI for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.


PeopleNet


2D Body Pose Estimation


Facial Landmark Estimation

The purpose-built models are available on NGC. Under each model cards, there is a pruned version that can be deployed as is or an unpruned version which can be used with TLT to fine tune with your own dataset.

Model Name Network Architecture Number
of classes
Accuracy Use Case
TrafficCamNet DetectNet_v2-ResNet18 4 83.5% mAP Detect and track cars
PeopleNet DetectNet_v2-ResNet18 3 80% mAP People counting, heatmap generation, social distancing
PeopleNet DetectNet_v2-ResNet34 3 84% mAP People counting, heatmap generation, social distancing
DashCamNet DetectNet_v2-ResNet18 4 80% mAP Identify objects from a moving object
FaceDetectIR DetectNet_v2-ResNet18 1 96% mAP Detect face in a dark environment with IR camera
VehicleMakeNet ResNet18 20 91% mAP Classifying car models
VehicleTypeNet ResNet18 6 96% mAP Classifying type of cars as coupe, sedan, truck, etc
PeopleSegNet MaskRCNN-ResNet50 1 85% mAP Creates segmentation masks around people, provides pixel
PeopleSemSegNet UNET 1 92% MIOU Creates semantic segmentation masks around people. Filters person from the background
License Plate Detection DetectNet_v2-ResNet18 1 98% mAP Detecting and localizing License plates on vehicles
License Plate Recognition Tuned ResNet18 36(US) / 68(CH) 97%(US)/99%(CH) Recognize License plates numbers
Gaze Estimation Four branch AlexNet based model N/A 6.5 RMSE Detects person's eye gaze
Facial Landmark Recombinator networks N/A 6.1 pixel error Estimates key points on person's face
Heart Rate Estimation Two branch model with attention N/A 0.7 BPM Estimates person's heartrate from RGB video
Gesture Recognition ResNet18 6 0.85 F1 score Recognize hand gestures
Emotion Recognition 5 Fully Connected Layers 6 0.91 F1 score Recognize facial Emotion
FaceDetect DetectNet_v2-ResNet18 1 85.3 mAP Detect faces from RGB or grayscale image
2D Body Pose Estimation Single shot bottom-up 18 - Estimates key joints on person's body

Architecture specific pre-trained models

In addition to purpose-built models, Transfer Learning Toolkit supports the following detection architectures:

These detection meta-architectures can be used with 13 backbones or feature extractors with TLT. For a complete list of all the permutations that are supported by TLT, please see the matrix below:

TLT3.0 supports instance segmentation using MaskRCNN architecture.

TLT3.0 supports semantic segmentation using UNET architecture.

Training

To get started, first choose the model architecture that you want to build, then select the appropriate model card on NGC and then choose one of the supported backbones.

LOGO

Running Transfer Learning Toolkit

  1. Setup your python environment using python virtualenv and virtualenvwrapper.

  2. In TLT3.0, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tlt-launcher will handle that. You may install the launcher using pip with the following commands.

pip3 install nvidia-pyindex
pip3 install nvidia-tlt
  1. Download the Jupyter notebooks that you are interested in from NGC resources. After installing the pre-requisite, all the training steps will be run from inside the Jupyter notebook.
Purpose-built Model Jupyter notebook
PeopleNet detectnet_v2/detectnet_v2.ipynb
TrafficCamNet detectnet_v2/detectnet_v2.ipynb
DashCamNet detectnet_v2/detectnet_v2.ipynb
FaceDetectIR detectnet_v2/detectnet_v2.ipynb
VehicleMakeNet classification/classification.ipynb
VehicleTypeNet classification/classification.ipynb
PeopleSegNet mask_rcnn/mask_rcnn.ipynb
License Plate Detection detectnet_v2/detectnet_v2.ipynb
License Plate Recognition lprnet/lprnet.ipynb
Gaze Estimation gazenet/gazenet.ipynb
Facial Landmark fpenet/fpenet.ipynb
Heart Rate Estimation heartratenet/heartratenet.ipynb
Gesture Recognition gesturenet/gesturenet.ipynb
Emotion Recognition emotionnet/emotionnet.ipynb
FaceDetect facenet/facenet.ipynb
2D Body Pose Net bpnet/bpnet.ipynb
PeopleSemSegNet unet/unet_isbi.ipynb
Open model architecture Jupyter notebook
DetectNet_v2 detectnet_v2/detectnet_v2.ipynb
FasterRCNN faster_rcnn/faster_rcnn.ipynb
YOLOV3 yolo_v3/yolo_v3.ipynb
YOLOV4 yolo_v4/yolo_v4.ipynb
SSD ssd/ssd.ipynb
DSSD dssd/dssd.ipynb
RetinaNet retinanet/retinanet.ipynb
MaskRCNN mask_rcnn/mask_rcnn.ipynb
UNET unet/unet_isbi.ipynb
classification classification/classification.ipynb

Using TLT Pre-trained Models

License

[TLT Getting Started](TLT getting Started License for TLT containers is included within the container at workspace/EULA.pdf. License for the pre-trained models are available with the model files. By pulling and using the Transfer Learning Toolkit SDK (TLT) container to download models, you accept the terms and conditions of these licenses.

Technical blogs

Suggested reading

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.