BodyPoseNet predicts the skeleton for every person in a given input image. The model predicts 18 key points including:
This model is ready for commercial use.
Architecture Type: Convolution Neural Network (CNN)
Network Architecture: Visual Geometry Group (VGG)
The BodyPoseNet models described in this card are used for multi-person human pose estimation network, which aims to predict the skeleton for every person in a given input image which consists of keypoints and the connections between them. This follows a single shot bottom-up methodology and there is no need for a person detector. Hence, the compute does not scale linearly with the number of people in the scene. The pose / skeleton output is commonly used as input for applications like activity/gesture recognition, fall detection, posture analysis, among others.
The default model predicts 18 keypoints including nose, neck, right_shoulder, right_elbow, right_wrist, left_shoulder, left_elbow, left_wrist, right_hip, right_knee, right_ankle, left_hip, left_knee, left_ankle, right_eye, left_eye, right_ear, left_ear.
Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: 3D
Other Properties Related to Input: RGB Format : H X W x 3, No minimum bit depth, alpha, or gamma
The images are pre-processed to handle normalization, resizing while maintaining the aspect ratio etc.
Output Type(s): Tensors
Output Format: N-Dimentional array
Output Parameters: 3D
Other Properties Related to Output:
Network outputs two tensors: confidence maps (H1' x W1' x C) and part affinity fields (H2' x W2' x P). After NMS and bipartite graph matching, we obtain final results with M x N X 3
where
Runtime Engine(s):
Supported Hardware Architecture(s):
Supported Operating System(s):
Link: https://storage.googleapis.com/openimages/web/download_v7.html
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
Roughly 400,000 images and 7,000 validation images across thousands of classes as defined by Google OpenImages Version Three (3) dataset. Most of the human verifications have been done with in-house annotators at Google. A smaller part has been done with crowd-sourced verification from Image Labeler: Crowdsource app, g.co/imagelabeler.
Link: https://cocodataset.org/#download
Data Collection Method by dataset:
Labeling Method by dataset:
Properties:
COCO validation sets, containing more than 200,000 images and 250,000 person instances labeled with keypoints.
The KPI for the evaluation data are reported in the table below.
Metric | IoU | Area | Score |
---|---|---|---|
AP | 0.50:0.95 | all | 56.2 |
AP | 0.5 | all | 79.3 |
AP | 0.50:0.95 | medium | 57.2 |
AP | 0.50:0.95 | large | 54.9 |
Engine: Tensor(RT)
Test Hardware:
The inference performance is measured for INT8 precision and for a input dimension of 288x384. The inference performance runs with trtexec
on Jetson Nano, AGX Xavier, Xavier NX and NVIDIA T4 GPU. The Jetson devices run at Max-N configuration for maximum system performance. The end-to-end performance with streaming video data might slightly vary depending on use cases of applications.
Device | Precision | Batch_size | FPS | Latency |
---|---|---|---|---|
Nano | INT8 | 8 | 5 | 200.0ms |
NX | INT8 | 8 | 93 | 10.71ms |
Xavier | INT8 | 8 | 160 | 6.25ms |
T4 | INT8 | 8 | 555 | 1.80ms |
The models in this page can only be used with Train Adapt Optimize (TAO) Toolkit. TAO provides a simple command line interface to train a deep learning model for body pose estimation.
Primary use case for this model is to detect human poses in a given RGB image. BodyPoseNet is commonly used for activity/gesture recognition, fall detection, posture analysis etc.
Install the NGC CLI from ngc.nvidia.com
Configure the NGC CLI using the following command
ngc config set
ngc registry model list nvidia/tao/bodyposenet:*
ngc registry model download-version nvidia/tao/bodyposenet:<template> --dest <path>
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards.