The models described in this card detect one or more license plate objects from a car image and return a box around each object, as well as an lpd
label for each object. Two kinds of pretrained LPD models are delivered --- one is trained on a NVIDIA-owned US license plate dataset and another is trained on a public Chinese City Parking dataset(CCPD).
These models are based on NVIDIA DetectNet_v2 detector with ResNet18 as feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class.
The raw normalized bounding-box and confidence detections needs to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce final bounding-box coordinates and category labels.
The training algorithm optimizes the network to minimize the localization and confidence loss for the objects. The training is carried out in two phases. In the first phase, the network is trained with regularization to facilitate pruning. Following the first phase, we prune the network removing channels whose kernel norms are below the pruning threshold. In the second phase the pruned network is retrained. Regularization is not included during the second phase.
Primary use case intended for these models is detecting license plates in a color (RGB) image. The model can be used to detect license plates from photos and videos by using appropriate video or image decoding and pre-processing.
For US license plate
For Chinese license plate: Color Images of resolution 720 X 1168 X 3
Category labels (lpd) and bounding-box coordinates for each detected license plate in the input image.
These models need to be used with NVIDIA Hardware and Software. For Hardware, the models can run on any NVIDIA GPU including NVIDIA Jetson devices. These models can only be used with Transfer Learning Toolkit (TLT), DeepStream SDK or TensorRT.
Totally there are four models provided:
The unpruned
models are intended for training and fine-tune using Transfer Learning Toolkit along with the user's dataset of license plates in United States of America or China. High fidelity models can be trained and adapted to the use case. The Jupyter notebook available as a part of TLT container can be used to re-train.
The usa pruned
models are intended for easy deployment to the edge using DeepStream SDK or TensorRT. They accept 640x480x3
dimension input tensors and outputs 40x30x12
bbox coordinate tensor and 40x30x3
class confidence tensor.
The ccpd pruned
models are intended for easy deployment to the edge using DeepStream SDK or TensorRT. These models accept 720x1168x3
dimension input tensors and outputs 45x73x12
bbox coordinate tensor and 45x73x3
class confidence tensor.
DeepStream provides a toolkit to create efficient video analytics pipelines to capture, decode, and pre-process the data before running inference. DeepStream will then post-process the output bbox coordinate tensor and class confidence tensors with NMS or DBScan clustering algorithm to create appropriate bounding boxes. The sample application and config file to run these models are provided in DeepStream SDK.
The unpruned
and pruned
models are encrypted and can be decrypted with the following key:
nvidia_tlt
Please make sure to use this as the key for all TLT commands that require a model load key.
In order to use these models as a pretrained weights for transfer learning, please use the snippet below as template for the model_config
component of the experiment spec file to train a DetectNet_v2 model. For more information on the experiment spec file, please refer to the Transfer Learning Tookit User Guide.
model_config {
num_layers: 18
pretrained_model_file: "/path/to/the/model.tlt"
use_batch_norm: true
objective_set {
bbox {
scale: 35.0
offset: 0.5
}
cov {
}
}
training_precision {
backend_floatx: FLOAT32
}
arch: "resnet"
}
To create the entire end-to-end video analytics application, deploy these models with DeepStream SDK. DeepStream SDK is a streaming analytics toolkit to accelerate building AI-based video analytics applications. DeepStream supports direct integration of these models into the deepstream sample app.
To deploy these models with DeepStream 5.1, please follow the instructions below:
Download and install DeepStream SDK. The installation instructions for DeepStream are provided in DeepStream development guide. The config files for the purpose-built models are located in:
/opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models
/opt/nvidia/deepstream
is the default DeepStream installation directory. This path will be different if you are installing in a different directory.
You need to create 1 label file and 2 config files.
labels_lpdnet.txt - Label file with 1 class
deepstream_app_source1_trafficcamnet_lpdnet.txt - Main config file for DeepStream app
config_infer_secondary_lpdnet.txt - File to configure inference settings
Create label file labels_lpdnet.txt
echo lpd > labels_lpdnet.txt
Create config file deepstream_app_source1_trafficcamnet_lpdnet.txt
cp deepstream_app_source1_trafficcamnet.txt deepstream_app_source1_trafficcamnet_lpdnet.txt
Modify config file deepstream_app_source1_trafficcamnet_lpdnet.txt
. Add below lines in it.
[secondary-gie0]
enable=1
model-engine-file=usa_pruned.etlt_b4_gpu0_int8.engine
gpu-id=0
batch-size=4
gie-unique-id=4
operate-on-gie-id=1
operate-on-class-ids=0;
config-file=config_infer_secondary_lpdnet.txt
Create config file config_infer_secondary_lpdnet.txt
[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=1
labelfile-path=<path to labels_lpdnet.txt>
tlt-encoded-model=<path to etlt_model>
tlt-model-key=nvidia_tlt
int8-calib-file=<path to calibration cache>
uff-input-dims=3;480;640;0 #For us model, set to 3;480;640;0 For ccpd model, set to 3;1168;720;0
uff-input-blob-name=input_1
batch-size=16
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
##1 Primary 2 Secondary
process-mode=2
interval=0
gie-unique-id=2
#0 detector 1 classifier 2 segmentatio 3 instance segmentation
network-type=0
operate-on-gie-id=1
operate-on-class-ids=0
cluster-mode=3
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
input-object-min-height=30
input-object-min-width=40
#enable-dla=1
[class-attrs-all]
pre-cluster-threshold=0.3
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0
Run deepstream-app
:
deepstream-app -c deepstream_app_source1_trafficcamnet_lpdnet.txt
Documentation to deploy with DeepStream is provided in "Deploying to DeepStream" chapter of TLT User Guide.
Run following command to do inference against an image or images folder. This command uses a spec file called inference_spec.txt. You can refer to following files inside TLT container.
/workspace/examples/detectnet_v2/specs/detectnet_v2_inference_kitti_etlt.txt
or
/workspace/examples/detectnet_v2/specs/detectnet_v2_inference_kitti_tlt.txt
tlt detectnet_v2 infer -e inference_spec.txt -o output_folder -i <image or image folder> -k nvidia_tlt
LPDNet model for US license plates was trained on a proprietary dataset with over 45000 US car images.
LPDNet model for Chinese license plates was trained on a public dataset CCPD (Chinese City Parking dataset) with about 172000 images. All images are taken manually by workers of a roadside parking management company in the streets of a provincial capital of China. The details of this dataset can be found in "Towards end-to-end license plate detection and recognition: A large dataset and baseline."(ECCV 2018)
The evaluation dataset for US LPDNet is obtained through the same way as training dataset. The images are picked from the raw images manually to be diversed at different angles, illumination and sharpness. The evaluation dataset for Chinese LPDNet includes 14% of the images in CCPD-Base(the base sub-dataset in CCPD).
The key performance indicator is the accuracy of license plate detection.The KPI for the evaluation data are reported in the table below.
Model | Dataset | Accuracy |
---|---|---|
usa_unpruned_model | NVIDIA 3k LPD eval dataset | 98.58% |
usa_pruned_model | NVIDIA 3k LPD eval dataset | 98.46% |
ccpd_unpruned_model | 14% of CCPD-Base dataset | 99.24% |
ccpd_pruned_model | 14% of CCPD-Base dataset | 99.22% |
The inference is run on the provided pruned models at INT8 precision. On the Jetson Nano FP16 precision is used. The inference performance runs with trtexec
on Jetson Nano, Jetson TX2, AGX Xavier, Xavier NX and NVIDIA T4 GPU. The Jetson devices run at Max-N configuration for maximum system performance. The performance shown below is only for inference of the usa pruned model. The end-to-end performance with streaming video data might slightly vary depending on use cases of applications.
Device | Precision | Batch_size | FPS |
---|---|---|---|
Nano | FP16 | 1 | 66 |
TX2 | INT8 | 1 | 187 |
NX | INT8 | 1 | 461 |
Xavier | INT8 | 1 | 913 |
T4 | INT8 | 1 | 2748 |
The LPD network for US license plates was trained on cropped car images with 640x480 resolution. Therefore, cropped car images which have aspect ratio of 4:3 may provide the expected detection results.
When cars are occluded or truncated too much, the license plate may not be detected by the LPDNet model.
The LPDNet model were trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.
Assume camera sensor is in the camera coordinate center. The X-axis is horizontal and points to the right, the Y-axis is vertical and points up and the Z-axis points towards the outside. In this coordinate system, the LPD network may provide the expected detection results under following conditions:
NVIDIA LPDNet model for US is trained on license plates collected in California. So for license plates in other states, the model will not be expected to reach the same level of accuracy as in California. NVIDIA LPDNet model for Chinese model is trained on license plates collected in Anhui province.
In general, to get better accuracy in a region other than US-California / China-Anhui in pretrain dataset, more data is needed in this region to finetune the pretrained model through Transfer Learning Toolkit.
License to use these models is covered by the Model EULA. By downloading the unpruned or pruned version of the model, you accept the terms and conditions of these licenses.
NVIDIA LPDNet model detects license plates.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.