NGC Catalog

CLASSIC

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Description

NVIDIA’s Transfer Learning Toolkit is a python-based AI training toolkit that allows developers to train faster and accurate neural networks on the popular deep learning architectures. Create accurate and efficient AI models for Intelligent Video Analytics and Computer Vision without expertise in AI frameworks.

Publisher

NVIDIA

Latest Tag

v3.0-py3

Modified

April 4, 2023

Compressed Size

7.69 GB

Multinode Support

Multi-Arch Support

v3.0-py3 (Latest) Security Scan Results

No results available.

What is Transfer Learning Toolkit?

Transfer Learning Toolkit (TLT) is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TLT adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.

The pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more.

Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TLT, DeepStream SDK and TensorRT. TLT can train models for common vision AI tasks such as object detection, classification, instance segmentation as well as other complex tasks such as pose estimation, facial landmark, gaze estimation, heart rate estimation and others.

Purpose-built Pre-Trained Models

Purpose-built pre-trained models offer highly accurate AI for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.

PeopleNet

2D Body Pose Estimation

Facial Landmark Estimation

The purpose-built models are available on NGC. Under each model cards, there is a pruned version that can be deployed as is or an unpruned version which can be used with TLT to fine tune with your own dataset.

Model Name	Network Architecture	Number of classes	Accuracy	Use Case
TrafficCamNet	DetectNet_v2-ResNet18	4	83.5% mAP	Detect and track cars
PeopleNet	DetectNet_v2-ResNet18	3	80% mAP	People counting, heatmap generation, social distancing
PeopleNet	DetectNet_v2-ResNet34	3	84% mAP	People counting, heatmap generation, social distancing
DashCamNet	DetectNet_v2-ResNet18	4	80% mAP	Identify objects from a moving object
FaceDetectIR	DetectNet_v2-ResNet18	1	96% mAP	Detect face in a dark environment with IR camera
VehicleMakeNet	ResNet18	20	91% mAP	Classifying car models
VehicleTypeNet	ResNet18	6	96% mAP	Classifying type of cars as coupe, sedan, truck, etc
PeopleSegNet	MaskRCNN-ResNet50	1	85% mAP	Creates segmentation masks around people, provides pixel
PeopleSemSegNet	UNET	1	92% MIOU	Creates semantic segmentation masks around people. Filters person from the background
License Plate Detection	DetectNet_v2-ResNet18	1	98% mAP	Detecting and localizing License plates on vehicles
License Plate Recognition	Tuned ResNet18	36(US) / 68(CH)	97%(US)/99%(CH)	Recognize License plates numbers
Gaze Estimation	Four branch AlexNet based model	N/A	6.5 RMSE	Detects person's eye gaze
Facial Landmark	Recombinator networks	N/A	6.1 pixel error	Estimates key points on person's face
Heart Rate Estimation	Two branch model with attention	N/A	0.7 BPM	Estimates person's heartrate from RGB video
Gesture Recognition	ResNet18	6	0.85 F1 score	Recognize hand gestures
Emotion Recognition	5 Fully Connected Layers	6	0.91 F1 score	Recognize facial Emotion
FaceDetect	DetectNet_v2-ResNet18	1	85.3 mAP	Detect faces from RGB or grayscale image
2D Body Pose Estimation	Single shot bottom-up	18	-	Estimates key joints on person's body

Architecture specific pre-trained models

In addition to purpose-built models, Transfer Learning Toolkit supports the following detection architectures:

These detection meta-architectures can be used with 13 backbones or feature extractors with TLT. For a complete list of all the permutations that are supported by TLT, please see the matrix below:

TLT3.0 supports instance segmentation using MaskRCNN architecture.

TLT3.0 supports semantic segmentation using UNET architecture.

Training

To get started, first choose the model architecture that you want to build, then select the appropriate model card on NGC and then choose one of the supported backbones.

LOGO

Running Transfer Learning Toolkit

Setup your python environment using python virtualenv and virtualenvwrapper.
In TLT3.0, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tlt-launcher will handle that. You may install the launcher using pip with the following commands.

pip3 install nvidia-pyindex
pip3 install nvidia-tlt

Download the Jupyter notebooks that you are interested in from NGC resources. After installing the pre-requisite, all the training steps will be run from inside the Jupyter notebook.

Purpose-built Model	Jupyter notebook
PeopleNet	detectnet_v2/detectnet_v2.ipynb
TrafficCamNet	detectnet_v2/detectnet_v2.ipynb
DashCamNet	detectnet_v2/detectnet_v2.ipynb
FaceDetectIR	detectnet_v2/detectnet_v2.ipynb
VehicleMakeNet	classification/classification.ipynb
VehicleTypeNet	classification/classification.ipynb
PeopleSegNet	mask_rcnn/mask_rcnn.ipynb
License Plate Detection	detectnet_v2/detectnet_v2.ipynb
License Plate Recognition	lprnet/lprnet.ipynb
Gaze Estimation	gazenet/gazenet.ipynb
Facial Landmark	fpenet/fpenet.ipynb
Heart Rate Estimation	heartratenet/heartratenet.ipynb
Gesture Recognition	gesturenet/gesturenet.ipynb
Emotion Recognition	emotionnet/emotionnet.ipynb
FaceDetect	facenet/facenet.ipynb
2D Body Pose Net	bpnet/bpnet.ipynb
PeopleSemSegNet	unet/unet_isbi.ipynb

Open model architecture	Jupyter notebook
DetectNet_v2	detectnet_v2/detectnet_v2.ipynb
FasterRCNN	faster_rcnn/faster_rcnn.ipynb
YOLOV3	yolo_v3/yolo_v3.ipynb
YOLOV4	yolo_v4/yolo_v4.ipynb
SSD	ssd/ssd.ipynb
DSSD	dssd/dssd.ipynb
RetinaNet	retinanet/retinanet.ipynb
MaskRCNN	mask_rcnn/mask_rcnn.ipynb
UNET	unet/unet_isbi.ipynb
classification	classification/classification.ipynb

Using TLT Pre-trained Models

Get TLT Object Detection pre-trained models for YOLOV4, YOLOV3, FasterRCNN, SSD, DSSD, and RetinaNet architectures from NGC model registry
Get TLT DetectNet_v2 Object Detection pre-trained models for DetectNet_v2 architecture from NGC model registry
Get TLT classification pre-trained models from NGC model registry
Get TLT Instance segmentation pre-trained models for MaskRCNN architecture from NGC
Get Purpose-built models from NGC model registry:

License

[TLT Getting Started](TLT getting Started License for TLT containers is included within the container at workspace/EULA.pdf. License for the pre-trained models are available with the model files. By pulling and using the Transfer Learning Toolkit SDK (TLT) container to download models, you accept the terms and conditions of these licenses.

Technical blogs

Read the 2 part blog on training and optimizing 2D body pose estimation model with TLT - Part 1 | Part 2
Learn how to train real-time License plate detection and recognition app with TLT and DeepStream.
Model accuracy is extremely important, learn how you can achieve state of the art accuracy for classification and object detection models using TLT
Learn how to train Instance segmentation model using MaskRCNN with TLT
Learn how to improve INT8 accuracy using Quantization aware training(QAT) with TLT
Read the technical tutorial on how PeopleNet model can be trained with custom data using Transfer Learning Toolkit
Learn how to train and deploy real-time intelligent video analytics apps and services using DeepStream SDK

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.

Transfer Learning Toolkit for Video Streaming Analytics