—

ReIdentificationNet

Model

—

ReIdentificationNet

Re-Identification network to generate embeddings for identifying persons in different scenes.

ReIdentificationNet Model Card

Description:

ReIdentificationNet generates embeddings for identifying people captured in different scenes.

This model is ready for commercial use

References:

Citations

H. Luo, Y. Gu, X. Liao, S. Lai and W. Jiang, "Bag of Tricks and a Strong Baseline for Deep Person Re-Identification," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, 1 pp. 1487-1495, doi: 10.1109/CVPRW.2019.00190.
L. Zheng, L. Shen, L. Tian, S. Wang, J. Wang and Q. Tian, "Scalable Person Re-identification: A Benchmark," 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1116-1124, doi: 10.1109/ICCV.2015.133.
M. Naphade, S. Wang, D. C. Anastasiu, Z. Tang, M.-C. Chang, Y. Yao, L. Zheng, M. S. Rahman, M. S. Arya, A. Sharma, Q. Feng, V. Ablavsky, S. Sclaroff, P. Chakraborty, S. Prajapati, A. Li, S. Li, K. Kunadharaju, S. Jiang and R. Chellappa, "The 7th AI City Challenge," in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2023.

Using TAO Pre-trained Models

Model Architecture:

Architecture Type: Convolution Neural Network (CNN)
Network Architecture: ResNet50

Input:

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: 2D
Other Properties Related to Input: Fixed Resolution: B X 3 X 256 X 128; No minimum bit depth, alpha, or gamma

Output:

Output Type(s): Embeddings
Output Format: Numpy (Npy)
Other Properties Related to Output: Precision to billionths

Software Integration:

Runtime Engine(s):

TAO - 5.2
DeepStream 6.1 or later

Supported Hardware Architecture(s):

Ampere
Jetson
Hopper
Lovelace
Pascal
Turing
Volta

Supported Operating System(s):

Linux
Linux 4 Tegra

Model Version(s):

deployable_v1.2 - ONNX model for re-identification deployable to DeepStream or TensorRT.
trainable_v1.1 - Pre-trained model for re-identification.
deployable_v1.1 - Encrypted model for re-identification deployable to DeepStream or TensorRT.

Training & Evaluation:

Training Dataset:

Data Collection Method by dataset:

Automatic/Sensors

Labeling Method by dataset:

Unknown

Properties: 14737 images from Market-1501 dataset of 751 real people and 29533 images of 156 people (148 of which are synthetic) from MTMC people tracking dataset from the 2023 AI City Challenge.

Class distribution

subset	no. total identities	no. total images	no. total cameras	no. real identities	no. real images	no. real cameras	no. synthetic identities	no. synthetic images	no. synthetic cameras
Train	907	44070	135	759	14537	13	148	29533	122
Test	907	28768	135	759	21163	13	148	7605	122
Query	906	4356	135	758	3539	13	148	817	122

Data Format

The data format must be in the following format.

/data
    /market1501
        /bounding_box_train
            0001_c1s1_01_00.jpg
            0001_c1s1_02_00.jpg
            0002_c1s1_03_00.jpg
            0002_c1s1_04_00.jpg
            0003_c1s1_05_00.jpg
            0003_c1s1_06_00.jpg
            ...
            ...
            ...
            N.png
        /bounding_box_test
            0001_c1s1_01_00.jpg
            0001_c1s1_02_00.jpg
            0002_c1s1_03_00.jpg
            0002_c1s1_04_00.jpg
            0003_c1s1_05_00.jpg
            0003_c1s1_06_00.jpg
            ...
            ...
            ...
            N.jpg
        /query
            0001_c1s1_01_00.jpg
            0001_c1s1_02_00.jpg
            0002_c1s1_03_00.jpg
            0002_c1s1_04_00.jpg
            0003_c1s1_05_00.jpg
            0003_c1s1_06_00.jpg
            ...
            ...
            ...
            N.jpg

The dataset should be divided into different directories by train, test and query folders. Each of these folders will contain image crops with the above naming scheme.

For example:, the image 0001_c1s1_01_00.jpg is the first sequence s1 of camera c1. 01 is the first frame in the sequence c1s1. 0001 in 0001_c1s1_01_00.jpg is the unique ID assigned to the object. Data after the third _ are ignored.

Evaluation Dataset:

Data Collection Method by dataset:

Automatic/Sensors

Labeling Method by dataset:

Unknown

Properties: 21163 testing images from Market-1501 dataset of 751 real people and 7605 testing images of 156 people (148 of which are synthetic) from MTMC people tracking dataset from the 2023 AI City Challenge.

Methodology and KPI

The key performance indicators are the ranked accuracy of re-identification and the mean average precision (mAP).

Rank-K accuracy: It is method of computing accuracy where the top-K highest confidence labels are matched with a ground truth label. If the ground truth label falls in one of these top-K labels, we state that this prediction is accurate. It allows us to get an overall accuracy measurement while being lenient on the predictions if the number of classes are too high and too similar. In our case, we compute rank-1, 5 and 10 accuracies. This means in case of rank-10, for a given sample, if the top-10 highest confidence labels predicted, match the label of ground truth, this sample will be counted as a correct measurement.

Mean average precision(mAP): Precision measures how accurate predictions are, in our case the logits of ID of an object. In other words, it measures the percentage of the predictions that are correct. mAP (mean average precision) is the average of average precision (AP) where AP is computed for each class, in our case ID.

model	feature dimension	mAP (%)	rank-1 accuracy (%)	rank-5 accuracy (%)	rank-10 accuracy (%)
resnet50_market1501	64	91.0	93.4	96.7	97.7
resnet50_market1501	128	92.1	94.5	96.9	97.9
resnet50_market1501	256	93.0	94.7	97.3	98.0
resnet50_market1501	512	93.4	95.1	97.5	98.1
resnet50_market1501	1024	93.7	94.8	97.5	98.2
resnet50_market1501	2048	93.9	95.3	98.0	98.4

Inference:

Engine: Tensor(RT), Triton
Test Hardware:

Jetson AGX Xavier
Xavier NX
Orin
Orin NX
NVIDIA T4
Ampere GPU
A2
A30
L4
T4
DGX H100
DGX A100
DGX H100
L40
JAO 64GB
Orin NX16GB
Orin Nano 8GB

The inference performance runs with trtexec on NVIDIA Ampere and Jetson GPUs. The end-to-end performance with image data might slightly vary depending on use cases of applications.

Model	Device	Precision	Batch Size	Latency (ms)	Images per Second
ResNet50	A10	Mixed	1	0.49	2057.64
ResNet50	A10	Mixed	16	2.83	5725.13
ResNet50	A10	Mixed	64	10.64	6088.47
ResNet50	A30	Mixed	1	0.50	2004.44
ResNet50	A30	Mixed	16	2.25	7445.72
ResNet50	A30	Mixed	64	7.14	9103.93
ResNet50	Jetson AGX Orin	FP16	1	0.96	1043.62
ResNet50	Jetson AGX Orin	FP16	16	6.42	2492.26
ResNet50	Jetson AGX Orin	FP16	64	23.09	2771.60

How to use this model

This model needs to be used with NVIDIA Hardware and Software. For Hardware, the model can run on any NVIDIA GPU including NVIDIA Jetson devices. This model can only be used with Train Adapt Optimize (TAO) Toolkit, DeepStream SDK or TensorRT.

Primary use case intended for this model is to generate embeddings for an object and then perform similarity matching.

A pre-trained model is provided:

resnet50_market1501_aicity156

It is intended for training and fine-tune using Train Adapt Optimize (TAO) Toolkit and the users' dataset of re-identification. High fidelity models can be trained to the new use cases. The Jupyter notebook available as a part of TAO container can be used to re-train.

The model is also intended for easy deployment to the edge using DeepStream SDK or TensorRT. DeepStream provides facility to create efficient video analytic pipelines to capture, decode and pre-process the data before running inference.

The model is encrypted and can be decrypted with the following key:

Model load key: nvidia_tao

Please make sure to use this as the key for all TAO commands that require a model load key.

Instructions to use the model with TAO toolkit

In order to use the model as pre-trained weights for transfer learning, please use the snippet below as a template for the model component of the experiment spec file to train a ReIdentificationNet. For more information on experiment spec file, please refer to the Train Adapt Optimize (TAO) Toolkit User Guide.

model:
  backbone: resnet_50
  last_stride: 1
  pretrain_choice: imagenet
  pretrained_model_path: /path/to/pretrained_resenet50.pth
  input_channels: 3
  input_width: 128
  input_height: 256
  neck: bnneck
  feat_dim: 256
  neck_feat: after
  metric_loss_type: triplet
  with_center_loss: False
  with_flip_feature: False
  label_smooth: True

Instructions to deploy the model with Triton Inference Server

To create the entire end-to-end video analytic application, deploy this model with Triton Inference Server. NVIDIA Triton Inference Server is an open-source inference serving software that helps standardize model deployment and execution and delivers fast and scalable AI in production. Triton supports direct integration of this model into the server and inference from a client.

To deploy this model with Triton Inference Server and end-to-end inference from video, please refer to the TAO Triton apps.

Technical blogs

Access the latest in Vision AI development workflows with NVIDIA TAO Toolkit 5.0
Improve accuracy and robustness of vision ai models with vision transformers and NVIDIA TAO
Train like a ‘pro’ without being an AI expert using TAO AutoML
Create Custom AI models using NVIDIA TAO Toolkit with Azure Machine Learning
Developing and Deploying AI-powered Robots with NVIDIA Isaac Sim and NVIDIA TAO
Learn endless ways to adapt and supercharge your AI workflows with TAO - Whitepaper
Customize Action Recognition with TAO and deploy with DeepStream
Read the 2 part blog on training and optimizing 2D body pose estimation model with TAO - Part 1 | Part 2
Learn how to train real-time License plate detection and recognition app with TAO and DeepStream.
Model accuracy is extremely important, learn how you can achieve state of the art accuracy for classification and object detection models using TAO

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards.

Publisher

—

Latest Versiondeployable_v1.2

UpdatedNovember 27, 2024 UTC

Compressed Size91.93 MB

Labels

AI Computer Vision CV Deep Learning DeepStream Metropolis NSPECT-CQBD-MZW9 Smart City Smart Infrastructure TAO TAO Toolkit TLT Transfer Learning Transfer Learning Toolkit