The BodyPose3DNet models described in this card are used for 3D human pose estimation network, which aims to predict the skeleton for every person in a given input image which consists of keypoints and the connections between them. 3D body pose Tracking has many practical applications such as action understanding, surveillance, human-robot interaction, motion capture and CGI, augmented and virtual reality, assisted living, advanced driver assistance systems (ADAS) and sport analysis, AI-powered sports coaches, Workplace activity monitoring, Crowd counting and tracking, allowing for character animation that doesn’t rely on markers or specialized suits.
Given an RGB image, we want to track the 2D and 3D poses of the body with 34 joints of the body: pelvis, left_hip, right_hip, torso, left_knee, right_knee, neck, left_ankle, right_ankle, left_big_toe, right_big_toe, left_small_toe, right_small_toe, left_heel, right_heel, nose, left_eye, right_eye, left_ear, right_ear, left_shoulder, right_shoulder, left_elbow, right_elbow, left_wrist, right_wrist, left_pinky_knuckle, right_pinky_knuckle, left_middle_tip, right_middle_tip, left_index_knuckle, right_index_knuckle, left_thumb_tip, right_thumb_tip. The picture demonstrates the predicted skeleton, constructed from the above 34 keypoints, overlaid on the original video frame.
Architecture Type: Deep Convolutional Neural Network.
Network Architecture: HRNet
The models in this page can only be used with Train Adapt Optimize (TAO) Toolkit. TAO provides a simple command line interface to train a deep learning model for 3D body pose estimation.
Primary use case for this model is to detect human poses in a given RGB image. BodyPose3DNet is commonly used for activity/gesture recognition, fall detection, posture analysis etc.
Install the NGC CLI from ngc.nvidia.com
Configure the NGC CLI using the following command
ngc config set
ngc registry model list nvidia/tao/bodypose3dnet:*
ngc registry model download-version nvidia/tao/bodypose3dnet:<template> --dest <path>
Input tensor 0: name: input0 elem_type: float32 shape: -1 x 3 x 256 x 192 Input tensor 1: name: k_inv elem_type: float32 shape: -1 x 3 x 3 Input tensor 2: name: t_form_inv elem_type: float32 shape: -1 x 3 x 3 Input tensor 3: name: scale_normalized_mean_limb_lengths elem_type: float32 shape: -1 x 36 Input tensor 4: name: mean_limb_lengths elem_type: float32 shape: -1 x 36
Network outputs four tensors:
Output tensor 0: name: pose2d elem_type: float32 shape: -1 x 34 x 3 [x, y, c] Output tensor 1: name: pose2d_org_img elem_type: float32 shape: -1 x 34 x 3 [x, y, c] Output tensor 2: name: pose25d elem_type: float32 shape: -1 x 34 x 4 [x, y, depth, c] Output tensor 3: name: pose3d elem_type: float32 shape: -1 x 34 x 3 [x, y, z]
Umar Iqbal, Pavlo Molchanov, Thomas Breuel, Juergen Gall, Jan Kautz (2018). Hand Pose Estimation via Latent 2.5D Heatmap Regression. In Proceedings of the IEEE Conference on European Conference on Computer Vision..
Umar Iqbal, Pavlo Molchanov, Jan Kautz (2020). Weakly-Supervised 3D Human Pose Learning via Multi-view Images in the Wild. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Jingdong Wang, Ke Sun, Tianheng Cheng, Borui Jiang, Chaorui Deng, Yang Zhao, Dong Liu, Yadong Mu, Mingkui Tan, Xinggang Wang, Wenyu Liu, Bin Xiao (2020). Deep High-Resolution Representation Learning for Visual Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, please visit this link, or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.