NGC | Catalog
Welcome Guest

LPDNet

For downloads and more information, please view on a desktop device.
Logo for LPDNet

Description

Object Detection network to detect license plates in an image of a car.

Publisher

NVIDIA

Use Case

Object Detection

Framework

Transfer Learning Toolkit

Latest Version

pruned_v1.0

Modified

August 24, 2021

Size

1.43 MB

License Plate Detection (LPDNet) Model Card

Model Overview

The models described in this card detect one or more license plate objects from a car image and return a box around each object, as well as an lpd label for each object. Two kinds of pretrained LPD models are delivered --- one is trained on a NVIDIA-owned US license plate dataset and another is trained on a public Chinese City Parking dataset(CCPD).

Model Architecture

These models are based on NVIDIA DetectNet_v2 detector with ResNet18 as feature extractor. This architecture, also known as GridBox object detection, uses bounding-box regression on a uniform grid on the input image. Gridbox system divides an input image into a grid which predicts four normalized bounding-box parameters (xc, yc, w, h) and confidence value per output class.

The raw normalized bounding-box and confidence detections needs to be post-processed by a clustering algorithm such as DBSCAN or NMS to produce final bounding-box coordinates and category labels.

Training Algorithm

The training algorithm optimizes the network to minimize the localization and confidence loss for the objects. The training is carried out in two phases. In the first phase, the network is trained with regularization to facilitate pruning. Following the first phase, we prune the network removing channels whose kernel norms are below the pruning threshold. In the second phase the pruned network is retrained. Regularization is not included during the second phase.

Citations

  • Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: CVPR. (2016)
  • Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks, In: CVPR. (2014)
  • He, K., Zhang, X., Ren, S., Sun, J.: Deep Residual Learning for Image Recognition. In: CVPR (2015)

Intended Use

Primary use case intended for these models is detecting license plates in a color (RGB) image. The model can be used to detect license plates from photos and videos by using appropriate video or image decoding and pre-processing.

Input

  • For US license plate

    • Color Images of resolution 640 X 480 X 3 (W x H x C)
    • Channel Ordering of the Input: NCHW, where N = Batch Size, C = number of channels (3), H = Height of images (480), W = Width of the images (640)
    • Input scale: 1/255.0
    • Mean subtraction: None
  • For Chinese license plate: Color Images of resolution 720 X 1168 X 3

    • Color Images of resolution 720 X 1168 X 3 (W x H x C)
    • Channel Ordering of the Input: NCHW, where N = Batch Size, C = number of channels (3), H = Height of images (1168), W = Width of the images (720)
    • Input scale: 1/255.0
    • Mean subtraction: None

Output

Category labels (lpd) and bounding-box coordinates for each detected license plate in the input image.

How to use this model

These models need to be used with NVIDIA Hardware and Software. For Hardware, the models can run on any NVIDIA GPU including NVIDIA Jetson devices. These models can only be used with Transfer Learning Toolkit (TLT), DeepStream SDK or TensorRT.

Totally there are four models provided:

  • usa_unpruned.tlt
  • usa_pruned.etlt
  • ccpd_unpruned.tlt
  • ccpd_pruned.etlt

The unpruned models are intended for training and fine-tune using Transfer Learning Toolkit along with the user's dataset of license plates in United States of America or China. High fidelity models can be trained and adapted to the use case. The Jupyter notebook available as a part of TLT container can be used to re-train.

The usa pruned models are intended for easy deployment to the edge using DeepStream SDK or TensorRT. They accept 640x480x3 dimension input tensors and outputs 40x30x12 bbox coordinate tensor and 40x30x3 class confidence tensor.

The ccpd pruned models are intended for easy deployment to the edge using DeepStream SDK or TensorRT. These models accept 720x1168x3 dimension input tensors and outputs 45x73x12 bbox coordinate tensor and 45x73x3 class confidence tensor.

DeepStream provides a toolkit to create efficient video analytics pipelines to capture, decode, and pre-process the data before running inference. DeepStream will then post-process the output bbox coordinate tensor and class confidence tensors with NMS or DBScan clustering algorithm to create appropriate bounding boxes. The sample application and config file to run these models are provided in DeepStream SDK.

The unpruned and pruned models are encrypted and can be decrypted with the following key:

  • Model load key: nvidia_tlt

Please make sure to use this as the key for all TLT commands that require a model load key.

Model versions:

  • unpruned - ResNet18 based pre-trained model. Intended for training.
  • pruned - ResNet18 deployment models. Contains calibration cache for GPU and DLA. DLA one is required if running inference on Jetson AGX Xavier or Xavier NX DLA.

Instructions to use unpruned model with TLT

In order to use these models as a pretrained weights for transfer learning, please use the snippet below as template for the model_config component of the experiment spec file to train a DetectNet_v2 model. For more information on the experiment spec file, please refer to the Transfer Learning Tookit User Guide.

  1. For ResNet18
model_config {
  num_layers: 18
  pretrained_model_file: "/path/to/the/model.tlt"
  use_batch_norm: true
  objective_set {
    bbox {
      scale: 35.0
      offset: 0.5
    }
    cov {
    }
  }
  training_precision {
    backend_floatx: FLOAT32
  }
  arch: "resnet"
}

Instructions to deploy these models with DeepStream

To create the entire end-to-end video analytics application, deploy these models with DeepStream SDK. DeepStream SDK is a streaming analytics toolkit to accelerate building AI-based video analytics applications. DeepStream supports direct integration of these models into the deepstream sample app.

To deploy these models with DeepStream 5.1, please follow the instructions below:

Download and install DeepStream SDK. The installation instructions for DeepStream are provided in DeepStream development guide. The config files for the purpose-built models are located in:

/opt/nvidia/deepstream/deepstream-5.1/samples/configs/tlt_pretrained_models

/opt/nvidia/deepstream is the default DeepStream installation directory. This path will be different if you are installing in a different directory.

You need to create 1 label file and 2 config files.

labels_lpdnet.txt - Label file with 1 class
deepstream_app_source1_trafficcamnet_lpdnet.txt - Main config file for DeepStream app
config_infer_secondary_lpdnet.txt - File to configure inference settings 

Create label file labels_lpdnet.txt

echo lpd > labels_lpdnet.txt

Create config file deepstream_app_source1_trafficcamnet_lpdnet.txt

cp deepstream_app_source1_trafficcamnet.txt deepstream_app_source1_trafficcamnet_lpdnet.txt

Modify config file deepstream_app_source1_trafficcamnet_lpdnet.txt. Add below lines in it.

[secondary-gie0]
enable=1
model-engine-file=usa_pruned.etlt_b4_gpu0_int8.engine
gpu-id=0
batch-size=4
gie-unique-id=4
operate-on-gie-id=1
operate-on-class-ids=0;
config-file=config_infer_secondary_lpdnet.txt

Create config file config_infer_secondary_lpdnet.txt

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=1
labelfile-path=<path to labels_lpdnet.txt>
tlt-encoded-model=<path to etlt_model>
tlt-model-key=nvidia_tlt
int8-calib-file=<path to calibration cache>
uff-input-dims=3;480;640;0  #For us model, set to 3;480;640;0  For ccpd model, set to 3;1168;720;0
uff-input-blob-name=input_1
batch-size=16
## 0=FP32, 1=INT8, 2=FP16 mode
network-mode=1
num-detected-classes=1
##1 Primary 2 Secondary
process-mode=2
interval=0
gie-unique-id=2
#0 detector 1 classifier 2 segmentatio 3 instance segmentation
network-type=0
operate-on-gie-id=1
operate-on-class-ids=0
cluster-mode=3
output-blob-names=output_cov/Sigmoid;output_bbox/BiasAdd
input-object-min-height=30
input-object-min-width=40
#enable-dla=1

[class-attrs-all]
pre-cluster-threshold=0.3
roi-top-offset=0
roi-bottom-offset=0
detected-min-w=0
detected-min-h=0
detected-max-w=0
detected-max-h=0

Run deepstream-app:

deepstream-app -c deepstream_app_source1_trafficcamnet_lpdnet.txt

Documentation to deploy with DeepStream is provided in "Deploying to DeepStream" chapter of TLT User Guide.

Example

Run following command to do inference against an image or images folder. This command uses a spec file called inference_spec.txt. You can refer to following files inside TLT container. /workspace/examples/detectnet_v2/specs/detectnet_v2_inference_kitti_etlt.txt or
/workspace/examples/detectnet_v2/specs/detectnet_v2_inference_kitti_tlt.txt

tlt detectnet_v2 infer -e inference_spec.txt -o output_folder  -i <image or image folder> -k nvidia_tlt

output annotated images


Training Data

LPDNet model for US license plates was trained on a proprietary dataset with over 45000 US car images.

LPDNet model for Chinese license plates was trained on a public dataset CCPD (Chinese City Parking dataset) with about 172000 images. All images are taken manually by workers of a roadside parking management company in the streets of a provincial capital of China. The details of this dataset can be found in "Towards end-to-end license plate detection and recognition: A large dataset and baseline."(ECCV 2018)

Evaluation Data

Dataset

The evaluation dataset for US LPDNet is obtained through the same way as training dataset. The images are picked from the raw images manually to be diversed at different angles, illumination and sharpness. The evaluation dataset for Chinese LPDNet includes 14% of the images in CCPD-Base(the base sub-dataset in CCPD).

Methodology and KPI

The key performance indicator is the accuracy of license plate detection.The KPI for the evaluation data are reported in the table below.

Model Dataset Accuracy
usa_unpruned_model NVIDIA 3k LPD eval dataset 98.58%
usa_pruned_model NVIDIA 3k LPD eval dataset 98.46%
ccpd_unpruned_model 14% of CCPD-Base dataset 99.24%
ccpd_pruned_model 14% of CCPD-Base dataset 99.22%

Real-time Inference Performance

The inference is run on the provided pruned models at INT8 precision. On the Jetson Nano FP16 precision is used. The inference performance runs with trtexec on Jetson Nano, Jetson TX2, AGX Xavier, Xavier NX and NVIDIA T4 GPU. The Jetson devices run at Max-N configuration for maximum system performance. The performance shown below is only for inference of the usa pruned model. The end-to-end performance with streaming video data might slightly vary depending on use cases of applications.

Device Precision Batch_size FPS
Nano FP16 1 66
TX2 INT8 1 187
NX INT8 1 461
Xavier INT8 1 913
T4 INT8 1 2748

Limitations

Aspect ratio of cropped car image

The LPD network for US license plates was trained on cropped car images with 640x480 resolution. Therefore, cropped car images which have aspect ratio of 4:3 may provide the expected detection results.

Occluded Cars

When cars are occluded or truncated too much, the license plate may not be detected by the LPDNet model.

Dark-lighting, Monochrome or Infrared Camera Images

The LPDNet model were trained on RGB images in good lighting conditions. Therefore, images captured in dark lighting conditions or a monochrome image or IR camera image may not provide good detection results.

Camera Positions

Assume camera sensor is in the camera coordinate center. The X-axis is horizontal and points to the right, the Y-axis is vertical and points up and the Z-axis points towards the outside. In this coordinate system, the LPD network may provide the expected detection results under following conditions:

  • Roll: within -30 degree to +30 degree
  • Pitch: within -30 degree to +30 degree
  • Yaw: within -15 degree to +15 degree
  • Distance to license plate: not too far away, so that the license plate in images are larger than 16x16 pixels

Restricted usage in different regions

NVIDIA LPDNet model for US is trained on license plates collected in California. So for license plates in other states, the model will not be expected to reach the same level of accuracy as in California. NVIDIA LPDNet model for Chinese model is trained on license plates collected in Anhui province.

In general, to get better accuracy in a region other than US-California / China-Anhui in pretrain dataset, more data is needed in this region to finetune the pretrained model through Transfer Learning Toolkit.

Using TLT Pre-trained Models

License

License to use these models is covered by the Model EULA. By downloading the unpruned or pruned version of the model, you accept the terms and conditions of these licenses.

Technical blogs

Suggested reading

Ethical AI

NVIDIA LPDNet model detects license plates.

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.