The model described in this card is license plate recognition network, which aims to recognize characters in license plates from cropped RGB license plate images. Two pretrained LPRNet models are delivered --- one is trained on a NVIDIA-owned US license plate dataset and another is trained on a Chinese license plate dataset.
This model is a sequence classification model with a ResNet backbone. And it will take the image as network input and produce sequence output.
The training algorithm optimizes the network to minimize the connectionist temporal classification (CTC) loss between a ground truth characters sequence of a license plate and a predicted characters sequence. Then the license plate will be decoded from the sequence output of the model through best path decoding method (greedy decoding).
LPRNet model for US license plates was trained on a proprietary dataset with over 310000 US license plates images. The license plates images are taken at various angle and illumination. The images are collected from dash camera and side camera of a vehicle. Then the license plates in the images are labeled and cropped. The US dataset statistics:
Characters distribution:
character | number |
---|---|
0 | 100688 |
1 | 117499 |
2 | 98599 |
3 | 111220 |
4 | 127387 |
5 | 148325 |
6 | 175541 |
7 | 231298 |
8 | 105170 |
9 | 111234 |
A | 36350 |
B | 33677 |
C | 40292 |
D | 39447 |
E | 36787 |
F | 34734 |
G | 40474 |
H | 38751 |
I | 12645 |
J | 34155 |
K | 37397 |
L | 40900 |
M | 36544 |
N | 40431 |
P | 38198 |
Q | 11086 |
R | 40899 |
S | 38820 |
T | 41155 |
U | 45471 |
V | 35998 |
W | 38096 |
X | 37468 |
Y | 34454 |
Z | 31963 |
Illumination: sunny, cloudy, rainy, bright, dim.
Locations of dataset collection: US roads and parking lots, mainly in California.
Camera mounting location: mainly the dash camera and the side camera in cars.
Camera angles: Assume camera sensor is in the camera coordinate center. The X-axis is horizontal and points to the right, the Y-axis is vertical and points up and the Z-axis points towards the outside. In this coordinate system, the license plates in following position are choosen:
License plates images shapes:
min | max | avg | |
---|---|---|---|
height | 17 | 1924 | 54 |
width | 35 | 3896 | 109 |
aspect-ratio (width/height) | 0.9 | 3.6 | 2.0 |
Some sample images (before cropping the license plates) can be found in output annotated images section of LPD's model card. |
LPRNet model for Chinese license plates was trained on a public dataset CCPD (Chinese City Parking dataset) with 100000 images. The details of this dataset can be found in "Towards end-to-end license plate detection and recognition: A large dataset and baseline."(ECCV 2018)
The data format must be in the following format.
/Dataset_01
/images
0000.jpg
0001.jpg
0002.jpg
...
...
...
N.jpg
/labels
0000.txt
0001.txt
0002.txt
...
...
...
N.txt
/characters_list.txt
Each cropped license plate image has a corresponding label text file which contains one line of characters in the specific license plate. There is a characters_list.txt
which has all the characters found in license plate dataset. Each character takes one line.
The evaluation dataset for US LPRNet contains 1951 images which are obtained through the same way as training dataset. The images are picked from the raw images manually to be diversed at different angles, illumination and sharpness.
The evaluation dataset for Chinese LPRNet is the validation split of Chinese City Parking Dataset (CCPD) which contains 99996 images.
The key performance indicator is the accuracy of license plate recognition. The accurate recognition means all the characters in a license plate are recognized correctly.The KPI for the evaluation data are reported below.
model | dataset | accuracy |
---|---|---|
us_lprnet_baseline18_unpruned | NVIDIA LPR eval dataset | 97.49% |
ch_lprnet_baseline18_unpruned | CCPD_base_val | 99.67% |
The inference uses FP16 precision. The inference performance runs with trtexec
on Jetson Nano, Xavier NX, AGX Xavier and NVIDIA T4 GPU. The Jetson devices run at Max-N configuration for maximum system performance. The data is the inference only performance. The end-to-end performance with streaming video data might slightly vary depending on use cases of applications.
Device | precision | batch_size | FPS |
---|---|---|---|
Jetson Nano | FP16 | 32 | 16 |
Jetson NX | FP16 | 32 | 600 |
Jetson Xavier | FP16 | 64 | 1021 |
T4 | FP16 | 128 | 3821 |
This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU including NVIDIA Jetson devices. This model can only be used with Transfer Learning Toolkit (TLT), DeepStream SDK or TensorRT.
Primary use case intended for this model is to recognize the license plate from the cropped RGB license plate image.
There are two models provided:
They are intended for training and fine-tune using Transfer Learning Toolkit and the users' dataset of license plates in United States of America or China. High fidelity models can be trained to the new use cases. The Jupyter notebook available as a part of TLT container can be used to re-train.
These models are also intended for easy deployment to the edge using DeepStream SDK or TensorRT. They accept 3x48x96
dimension input tensors and output the predicted sequence characters id. DeepStream provides facility to create efficient video analytic pipelines to capture, decode and pre-process the data before running inference.
The models are encrypted and can be decrypted with the following key:
nvidia_tlt
Please make sure to use this as the key for all TLT commands that require a model load key.
RGB Images of 3 X 48 X 96 (C H W)
characters id sequence. (DeepStream post-process plugin is needed to get the final license plate)
In order to use these models as pretrained weights for transfer learning, please use the snippet below as a template for the model_config
component of the experiment spec file to train a LPRNet model. For more information on experiment spec file, please refer to the Transfer Learning Toolkit User Guide.
lpr_config {
hidden_units: 512
max_label_length: 8
arch: "baseline"
nlayers: 18
}
To create the entire end-to-end video analytic application, deploy this model with DeepStream SDK. DeepStream SDK is a streaming analytic toolkit to accelerate building AI-based video analytic applications. DeepStream supports direct integration of this model into the deepstream sample app.
To deploy this model with DeepStream 5.1, please follow the instructions in this repository.
NVIDIA LPRNet model for US is trained on license plates collected in California. So for license plates in other states, the model will not be expected to reach the same level of accuracy as in California.
NVIDIA LPRNet model for Chinese model is trained on license plates collected in Anhui province.
In general, to get better accuracy in a region other than US-California / China-Anhui in pretrain dataset, more data is needed in this region to finetune the pretrained model through TAO Toolkit.
NVIDIA LPRNet models may not work well with truncated license plates in which the characters shapes are not complete. LPRNet models' accuracies rely on the license plate detection's quality.
NVIDIA LPRNet model for US is trained on almost horizontal license plates. If the license plate's angle with horizontal line is larger than 30 degrees, the characters in it may not be recognized.
License to use these models is covered by the Model EULA. By downloading the unpruned or pruned version of the model, you accept the terms and conditions of these licenses.
NVIDIA LPRNet model recognizes license plates.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.