NGC | Catalog
CatalogModelsRe-Identification

Re-Identification

For downloads and more information, please view on a desktop device.
Logo for Re-Identification

Description

Re-Identification network to generate embeddings for identifying persons in different scenes.

Publisher

-

Use Case

Re Identification

Framework

Transfer Learning Toolkit

Latest Version

trainable_v1.0

Modified

January 4, 2023

Size

324.07 MB

ReIdentificationNet Model Card

Model Overview

The model described in this card is a re-identification network, which aims to generate embeddings for identifying objects captured in different scenes. Under the hood, the current backbone of the network is ResNet50. A pre-trained ReIdentificatioNet model based on the Market-1501 dataset is delivered. The model is trained on the Market-1501 dataset on 751 unique IDs.

Model Architecture

The model has ResNet50 architecture. It will take cropped images of the objects as input and produce embeddings as output.

Training

The training algorithm optimizes the network to minimize the triplet, center and cross entropy loss.

Training Data

The model is trained on the Market-1501 dataset with 751 annotated people. The dataset statistics are as follows:

  • Class distribution:

    subset no. identities no. images no. cameras
    Train 751 12936 6
    Test 751 15913 6
    Query 750 3368 6

Data Format

The data format must be in the following format.

/data
    /market1501
        /bounding_box_train
            0001_c1s1_01_00.jpg
            0001_c1s1_02_00.jpg
            0002_c1s1_03_00.jpg
            0002_c1s1_04_00.jpg
            0003_c1s1_05_00.jpg
            0003_c1s1_06_00.jpg
            ...
            ...
            ...
            N.png
        /bounding_box_test
            0001_c1s1_01_00.jpg
            0001_c1s1_02_00.jpg
            0002_c1s1_03_00.jpg
            0002_c1s1_04_00.jpg
            0003_c1s1_05_00.jpg
            0003_c1s1_06_00.jpg
            ...
            ...
            ...
            N.jpg
        /query
            0001_c1s1_01_00.jpg
            0001_c1s1_02_00.jpg
            0002_c1s1_03_00.jpg
            0002_c1s1_04_00.jpg
            0003_c1s1_05_00.jpg
            0003_c1s1_06_00.jpg
            ...
            ...
            ...
            N.jpg

The dataset should be divided into different directories by train, test and query folders. Each of these folders will contain image crops with the above naming scheme.

For example:, the image 0001_c1s1_01_00.jpg is the first sequence s1 of camera c1. 01 is the first frame in the sequence c1s1. 0001 in 0001_c1s1_01_00.jpg is the unique ID assigned to the object. Data after the third _ are ignored.

Performance

Test Data

As shown in the class distribution table above, the test set contains the same identities of the query set. The goal is to identify test samples of the same identities for each query.

Methodology and KPI

The key performance indicators are the ranked accuracy of re-identification and the mean average precision (mAP).

Rank-K accuracy: It is method of computing accuracy where the top-K highest confidence labels are matched with a ground truth label. If the ground truth label falls in one of these top-K labels, we state that this prediction is accurate. It allows us to get an overall accuracy measurement while being lenient on the predictions if the number of classes are too high and too similar. In our case, we compute rank-1, 5 and 10 accuracies. This means in case of rank-10, for a given sample, if the top-10 highest confidence labels predicted, match the label of ground truth, this sample will be counted as a correct measurement.

Mean average precision(mAP): Precision measures how accurate predictions are, in our case the logits of ID of an object. In other words, it measures the percentage of the predictions that are correct. mAP (mean average precision) is the average of average precision (AP) where AP is computed for each class, in our case ID.

model feature dimension mAP rank-1 accuracy rank-5 accuracy rank-10 accuracy
resnet50_market1501 64 91.0 93.4% 96.7% 97.7%
resnet50_market1501 128 92.1 94.5% 96.9% 97.9%
resnet50_market1501 256 93.0 94.7% 97.3% 98.0%
resnet50_market1501 512 93.4 95.1% 97.5% 98.1%
resnet50_market1501 1024 93.7 94.8% 97.5% 98.2%
resnet50_market1501 2048 93.9 95.3% 98.0% 98.4%

Real-time Inference Performance

The inference performance runs with trtexec on NVIDIA Ampere and Jetson GPUs. The end-to-end performance with image data data might slightly vary depending on use cases of applications.

Model Device Precision Batch Size Latency (ms) Images per Second
ResNet50 A10 Mixed 1 0.49 2057.64
ResNet50 A10 Mixed 16 2.83 5725.13
ResNet50 A10 Mixed 64 10.64 6088.47
ResNet50 A30 Mixed 1 0.50 2004.44
ResNet50 A30 Mixed 16 2.25 7445.72
ResNet50 A30 Mixed 64 7.14 9103.93
ResNet50 Jetson AGX Orin FP16 1 0.96 1043.62
ResNet50 Jetson AGX Orin FP16 16 6.42 2492.26
ResNet50 Jetson AGX Orin FP16 64 23.09 2771.60

How to use this model

This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU including NVIDIA Jetson devices. This model can only be used with Train Adapt Optimize (TAO) Toolkit, DeepStream SDK or TensorRT.

Primary use case intended for this model is to generate embeddings for an object and then perform similarity matching.

A pre-trained model is provided:

  • resnet50_market1501

It is intended for training and fine-tune using Train Adapt Optimize (TAO) Toolkit and the users' dataset of re-identification. High fidelity models can be trained to the new use cases. The Jupyter notebook available as a part of TAO container can be used to re-train.

The model is also intended for easy deployment to the edge using DeepStream SDK or TensorRT. DeepStream provides facility to create efficient video analytic pipelines to capture, decode and pre-process the data before running inference.

The model is encrypted and can be decrypted with the following key:

  • Model load key: nvidia_tao

Please make sure to use this as the key for all TAO commands that require a model load key.

Input

B X 3 X 256 X 128 (B C H W)

Output

The feature embedding

Instructions to use the model with TAO toolkit

In order to use the model as pre-trained weights for transfer learning, please use the snippet below as a template for the model_config component of the experiment spec file to train a ReIdentificationNet. For more information on experiment spec file, please refer to the Train Adapt Optimize (TAO) Toolkit User Guide.

model_config:
  backbone: resnet50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: /path/to/pretrained_resenet50.pth
  input_channels: 3
  input_size: [256, 128]
  neck: bnneck
  feat_dim: 256
  num_classes: 751
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True

Instructions to deploy the model with Triton Inference Server

To create the entire end-to-end video analytic application, deploy this model with Triton Inference Server. NVIDIA Triton Inference Server is an open-source inference serving software that helps standardize model deployment and execution and delivers fast and scalable AI in production. Triton supports direct integration of this model into the server and inference from a client.

To deploy this model with Triton Inference Server and end-to-end inference from video, please refer to the TAO Triton apps.

Limitations

NVIDIA ReIdnetificationNet is trained on an the Market-1501 dataset with 751 unique person classes. It is expected that the accuracy of the model on external images is not at the same level as the number reported in performance section.

In general, to get better accuracy, more labeled data are needed to fine-tune the pre-trained model through TAO Toolkit.

Model versions:

  • trainable_v1.0 - Pre-trained model for re-identification.
  • deployable_v1.0 - Model for re-identification deployable to DeepStream or TensorRT.

Reference

Citations

  • H. Luo, Y. Gu, X. Liao, S. Lai and W. Jiang, "Bag of Tricks and a Strong Baseline for Deep Person Re-Identification," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, 1 pp. 1487-1495, doi: 10.1109/CVPRW.2019.00190.
  • L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1116-1124, doi: 10.1109/ICCV.2015.133.

Using TAO Pre-trained Models

License

License to use the model is covered by the Model EULA. By downloading the unpruned or pruned version of the model, you accept the terms and conditions of these licenses.

Technical blogs

Suggested reading

Ethical AI

NVIDIA ReIdentificationNet model creates embeddings for identifying objects captured in different scenes.

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.