Transfer Learning Toolkit (TLT) is a python based AI toolkit for taking purpose-built pre-trained AI models and customizing them with your own data. TLT adapts popular network architectures and backbones to your data, allowing you to train, fine tune, prune and export highly optimized and accurate AI models for edge deployment.
The pre-trained models accelerate the AI training process and reduce costs associated with large scale data collection, labeling, and training models from scratch. Transfer learning with pre-trained models can be used for AI applications in smart cities, retail, healthcare, industrial inspection and more.
Build end-to-end services and solutions for transforming pixels and sensor data to actionable insights using TLT, DeepStream SDK and TensorRT. TLT can train models for common vision AI tasks such as object detection, classification, instance segmentation as well as other complex tasks such as pose estimation, facial landmark, gaze estimation, heart rate estimation and others.
Purpose-built pre-trained models offer highly accurate AI for a variety of vision AI tasks. Developers, system builders and software partners building intelligent vision AI apps and services, can bring their own custom data and train with and fine-tune pre-trained models instead of going through the hassle of large data collection and training from scratch.
PeopleNet
2D Body Pose Estimation
Facial Landmark Estimation
The purpose-built models are available on NGC. Under each model cards, there is a pruned version that can be deployed as is or an unpruned version which can be used with TLT to fine tune with your own dataset.
Model Name | Network Architecture | Number of classes |
Accuracy | Use Case |
---|---|---|---|---|
TrafficCamNet | DetectNet_v2-ResNet18 | 4 | 83.5% mAP | Detect and track cars |
PeopleNet | DetectNet_v2-ResNet18 | 3 | 80% mAP | People counting, heatmap generation, social distancing |
PeopleNet | DetectNet_v2-ResNet34 | 3 | 84% mAP | People counting, heatmap generation, social distancing |
DashCamNet | DetectNet_v2-ResNet18 | 4 | 80% mAP | Identify objects from a moving object |
FaceDetectIR | DetectNet_v2-ResNet18 | 1 | 96% mAP | Detect face in a dark environment with IR camera |
VehicleMakeNet | ResNet18 | 20 | 91% mAP | Classifying car models |
VehicleTypeNet | ResNet18 | 6 | 96% mAP | Classifying type of cars as coupe, sedan, truck, etc |
PeopleSegNet | MaskRCNN-ResNet50 | 1 | 85% mAP | Creates segmentation masks around people, provides pixel |
PeopleSemSegNet | UNET | 1 | 92% MIOU | Creates semantic segmentation masks around people. Filters person from the background |
License Plate Detection | DetectNet_v2-ResNet18 | 1 | 98% mAP | Detecting and localizing License plates on vehicles |
License Plate Recognition | Tuned ResNet18 | 36(US) / 68(CH) | 97%(US)/99%(CH) | Recognize License plates numbers |
Gaze Estimation | Four branch AlexNet based model | N/A | 6.5 RMSE | Detects person's eye gaze |
Facial Landmark | Recombinator networks | N/A | 6.1 pixel error | Estimates key points on person's face |
Heart Rate Estimation | Two branch model with attention | N/A | 0.7 BPM | Estimates person's heartrate from RGB video |
Gesture Recognition | ResNet18 | 6 | 0.85 F1 score | Recognize hand gestures |
Emotion Recognition | 5 Fully Connected Layers | 6 | 0.91 F1 score | Recognize facial Emotion |
FaceDetect | DetectNet_v2-ResNet18 | 1 | 85.3 mAP | Detect faces from RGB or grayscale image |
2D Body Pose Estimation | Single shot bottom-up | 18 | - | Estimates key joints on person's body |
In addition to purpose-built models, Transfer Learning Toolkit supports the following detection architectures:
These detection meta-architectures can be used with 13 backbones or feature extractors with TLT. For a complete list of all the permutations that are supported by TLT, please see the matrix below:
TLT3.0 supports instance segmentation using MaskRCNN architecture.
TLT3.0 supports semantic segmentation using UNET architecture.
To get started, first choose the model architecture that you want to build, then select the appropriate model card on NGC and then choose one of the supported backbones.
Setup your python environment using python virtualenv
and virtualenvwrapper
.
In TLT3.0, we have created an abstraction above the container, you will launch all your training jobs from the launcher. No need to manually pull the appropriate container, tlt-launcher will handle that. You may install the launcher using pip with the following commands.
pip3 install nvidia-pyindex
pip3 install nvidia-tlt
Purpose-built Model | Jupyter notebook |
---|---|
PeopleNet | detectnet_v2/detectnet_v2.ipynb |
TrafficCamNet | detectnet_v2/detectnet_v2.ipynb |
DashCamNet | detectnet_v2/detectnet_v2.ipynb |
FaceDetectIR | detectnet_v2/detectnet_v2.ipynb |
VehicleMakeNet | classification/classification.ipynb |
VehicleTypeNet | classification/classification.ipynb |
PeopleSegNet | mask_rcnn/mask_rcnn.ipynb |
License Plate Detection | detectnet_v2/detectnet_v2.ipynb |
License Plate Recognition | lprnet/lprnet.ipynb |
Gaze Estimation | gazenet/gazenet.ipynb |
Facial Landmark | fpenet/fpenet.ipynb |
Heart Rate Estimation | heartratenet/heartratenet.ipynb |
Gesture Recognition | gesturenet/gesturenet.ipynb |
Emotion Recognition | emotionnet/emotionnet.ipynb |
FaceDetect | facenet/facenet.ipynb |
2D Body Pose Net | bpnet/bpnet.ipynb |
PeopleSemSegNet | unet/unet_isbi.ipynb |
Open model architecture | Jupyter notebook |
---|---|
DetectNet_v2 | detectnet_v2/detectnet_v2.ipynb |
FasterRCNN | faster_rcnn/faster_rcnn.ipynb |
YOLOV3 | yolo_v3/yolo_v3.ipynb |
YOLOV4 | yolo_v4/yolo_v4.ipynb |
SSD | ssd/ssd.ipynb |
DSSD | dssd/dssd.ipynb |
RetinaNet | retinanet/retinanet.ipynb |
MaskRCNN | mask_rcnn/mask_rcnn.ipynb |
UNET | unet/unet_isbi.ipynb |
classification | classification/classification.ipynb |
Get TLT Object Detection pre-trained models for YOLOV4, YOLOV3, FasterRCNN, SSD, DSSD, and RetinaNet architectures from NGC model registry
Get TLT DetectNet_v2 Object Detection pre-trained models for DetectNet_v2 architecture from NGC model registry
Get TLT classification pre-trained models from NGC model registry
Get TLT Instance segmentation pre-trained models for MaskRCNN architecture from NGC
Get Purpose-built models from NGC model registry:
[TLT Getting Started](TLT getting Started
License for TLT containers is included within the container at workspace/EULA.pdf
. License for the pre-trained models are available with the model files. By pulling and using the Transfer Learning Toolkit SDK (TLT) container to download models, you accept the terms and conditions of these licenses.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.