NGC Catalog
CLASSIC
Welcome Guest
Models
BodyPoseNet

BodyPoseNet

For downloads and more information, please view on a desktop device.
Logo for BodyPoseNet
Features
Description
Detect body pose from an image.
Publisher
NVIDIA
Latest Version
deployable_onnx_v1.0.1
Modified
November 12, 2024
Size
64.09 MB

BodyPoseNet Model Card

Model Overview

Description:

BodyPoseNet predicts the skeleton for every person in a given input image. The model predicts 18 key points including:

  • nose,
  • neck,
  • right shoulder,
  • right elbow,
  • right wrist,
  • left shoulder,
  • left elbow,
  • left wrist,
  • right hip,
  • right knee,
  • right ankle,
  • left hip,
  • left knee,
  • left ankle,
  • right eye,
  • left eye,
  • right ear,
  • left ear

This model is ready for commercial use.

References:

  • Zhe Cao, Gines Hidalgo, Tomas Simon, Shih-En Wei, Yaser Sheikh (2017). Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

Using TAO Pre-trained Models

  • Get TAO Container
  • Get other purpose-built models from the NGC model registry:
    • TrafficCamNet
    • PeopleNet
    • PeopleNet-Transformer
    • DashCamNet
    • FaceDetectIR
    • VehicleMakeNet
    • VehicleTypeNet
    • PeopleSegNet
    • PeopleSemSegNet
    • License Plate Detection
    • License Plate Recognition
    • PoseClassificationNet
    • Facial Landmark
    • FaceDetect
    • 2D Body Pose Estimation
    • ActionRecognitionNet
    • People ReIdentification
    • PointPillarNet
    • CitySegFormer
    • Retail Object Detection
    • Retail Object Embedding
    • Optical Inspection
    • Optical Character Detection
    • Optical Character Recognition
    • PCB Classification
    • PeopleSemSegFormer

Model Architecture:

Architecture Type: Convolution Neural Network (CNN)
Network Architecture: Visual Geometry Group (VGG)

The BodyPoseNet models described in this card are used for multi-person human pose estimation network, which aims to predict the skeleton for every person in a given input image which consists of keypoints and the connections between them. This follows a single shot bottom-up methodology and there is no need for a person detector. Hence, the compute does not scale linearly with the number of people in the scene. The pose / skeleton output is commonly used as input for applications like activity/gesture recognition, fall detection, posture analysis, among others.

The default model predicts 18 keypoints including nose, neck, right_shoulder, right_elbow, right_wrist, left_shoulder, left_elbow, left_wrist, right_hip, right_knee, right_ankle, left_hip, left_knee, left_ankle, right_eye, left_eye, right_ear, left_ear.

Fig 1. Example illustration of BodyPoseNet output

Input:

Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: 3D
Other Properties Related to Input: RGB Format : H X W x 3, No minimum bit depth, alpha, or gamma

The images are pre-processed to handle normalization, resizing while maintaining the aspect ratio etc.

Output:

Output Type(s): Tensors
Output Format: N-Dimentional array
Output Parameters: 3D
Other Properties Related to Output:
Network outputs two tensors: confidence maps (H1' x W1' x C) and part affinity fields (H2' x W2' x P). After NMS and bipartite graph matching, we obtain final results with M x N X 3 where

  • N is the number of keypoints.
  • M is the number of humans detected in the image.
  • C is the number of confidence map channels - corresponds to number of keypoints + background
  • P is the number of part affinity field channels - corresponds to the (2 x number of edges used in the skeleton)
  • H1', W1' are the height and width of the output confidence maps respectively
  • H2', W2' are the height and width of the output part affinity fields respectively

Software Integration:

Runtime Engine(s):

  • TAO - 5.2

Supported Hardware Architecture(s):

  • Ampere
  • Jetson
  • Hopper
  • Lovelace
  • Pascal
  • Turing
  • Volta

Supported Operating System(s):

  • Linux
  • Linux 4 Tegra

Model Version(s):

  • trainable_v1.0 - this pretrained model is intended to be used for finetuning on custom datasets using TAO.
  • deployabale_v1.0 - this deployable model is intended to run on the inference pipeline. There are INT8 calibration files provided for three resolutions including 224x320, 288x384 and 320x448. These calibration files are generated for TensorRT 7.
  • deployabale_v1.0.1 - this deployable model is intended to run on the inference pipeline. There are INT8 calibration files provided for three resolutions including 224x320, 288x384 and 320x448. These calibration files are generated for TensorRT 8.

Training & Evaluation:

Training Dataset:

Link: https://storage.googleapis.com/openimages/web/download_v7.html
Data Collection Method by dataset:

  • Unknown

Labeling Method by dataset:

  • Unknown

Properties:
Roughly 400,000 images and 7,000 validation images across thousands of classes as defined by Google OpenImages Version Three (3) dataset. Most of the human verifications have been done with in-house annotators at Google. A smaller part has been done with crowd-sourced verification from Image Labeler: Crowdsource app, g.co/imagelabeler.

Evaluation Dataset:

Link: https://cocodataset.org/#download
Data Collection Method by dataset:

  • Unknown

Labeling Method by dataset:

  • Unknown

Properties:
COCO validation sets, containing more than 200,000 images and 250,000 person instances labeled with keypoints.

Methodology and KPI

The KPI for the evaluation data are reported in the table below.

Metric IoU Area Score
AP 0.50:0.95 all 56.2
AP 0.5 all 79.3
AP 0.50:0.95 medium 57.2
AP 0.50:0.95 large 54.9

Inference:

Engine: Tensor(RT)
Test Hardware:

  • Jetson AGX Xavier
  • Xavier NX
  • Orin
  • Orin NX
  • NVIDIA T4
  • Ampere GPU
  • A2
  • A30
  • L4
  • T4
  • DGX H100
  • DGX A100
  • DGX H100
  • L40
  • JAO 64GB
  • Orin NX16GB
  • Orin Nano 8GB

The inference performance is measured for INT8 precision and for a input dimension of 288x384. The inference performance runs with trtexec on Jetson Nano, AGX Xavier, Xavier NX and NVIDIA T4 GPU. The Jetson devices run at Max-N configuration for maximum system performance. The end-to-end performance with streaming video data might slightly vary depending on use cases of applications.

Device Precision Batch_size FPS Latency
Nano INT8 8 5 200.0ms
NX INT8 8 93 10.71ms
Xavier INT8 8 160 6.25ms
T4 INT8 8 555 1.80ms

How to use this model

The models in this page can only be used with Train Adapt Optimize (TAO) Toolkit. TAO provides a simple command line interface to train a deep learning model for body pose estimation.

Primary use case for this model is to detect human poses in a given RGB image. BodyPoseNet is commonly used for activity/gesture recognition, fall detection, posture analysis etc.

  1. Install the NGC CLI from ngc.nvidia.com

  2. Configure the NGC CLI using the following command

ngc config set
  1. To view all the models that are supported in TAO:
ngc registry model list nvidia/tao/bodyposenet:*
  1. To download the model:
ngc registry model download-version nvidia/tao/bodyposenet:<template> --dest <path>

Technical blogs

  • Read the 2 part blog on training and optimizing 2D body pose estimation model with TAO - Part 1 | Part 2
  • Read the technical tutorial on how PeopleNet model can be trained with custom data using Transfer Learning Toolkit

Suggested reading

  • More information on about TAO Toolkit and pre-trained models can be found at the NVIDIA Developer Zone
  • Read the TAO getting started guide guide and release notes.
  • If you have any questions or feedback, please refer to the discussions on TAO Toolkit Developer Forums

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards.