vision-ai-movenet | NVIDIA NGC

NGC Catalog

CLASSIC

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Associated Products

Features

Description

Container for Tokkio vision AI movenet

Publisher

NVIDIA

Latest Tag

0.1.82

Modified

May 2, 2025

Compressed Size

13.11 GB

Multinode Support

Multi-Arch Support

0.1.82 (Latest) Security Scan Results

Linux / amd64

What is the movenet VisionAI Container?

The Tokkio Vision AI container is a robust video inference solution designed to extract facial bounding boxes and body poses from user video streams. It is implemented using the GXF and Deepstream frameworks. It includes the Movenet model as well.

Model Overview

Description

This model predicts human joint locations of a single person.The model is designed to be run in the browser using Tensorflow.js or on devices using TF Lite in real-time, targeting movement/fitness activities. This variant: MoveNet.SinglePose.Thunder is a higher capacity model (compared to MoveNet.SinglePose.Lightning) that performs better prediction quality while still achieving real-time (<30FPS) speed. Naturally, thunder will lag behind the lightning, but it will pack a punch. This model is not developed by NVIDIA and provided by third-party as is within the container

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. See the Movenet Model Card for more information: https://www.kaggle.com/models/google/movenet.

Terms of use

This model is licensed and released under Apache-2.

You are responsible for ensuring that your use of models complies with all applicable laws.

Model Architecture

MobileNetV2 image feature extractor with Feature Pyramid Network decoder (to stride of 4) followed by CenterNet prediction heads with custom post-processing logics. Lightning uses depth multiplier 1.0 while Thunder uses depth multiplier 1.75.

Input

Input Type(s): Image

Input Format(s): Red, Green, Blue (RGB)

Input Parameters: Two Dimensional (2D)

Other Properties Related to Input: A frame of video or an image, represented as an int32 tensor of shape: 192x192x3(Lightning) 256x256x3(Thunder). Channels order: RGB with values in [0, 255].

Output

Output Type(s): Tensor

Output Format: float32

Output Parameters: Shape [1,1,17,3]

Other Properties Related to Output: The first two channels of the last dimension represents the yx coordinates (normalized to image frame, i.e. range in [0.0, 1.0]) of the 17 keypoints (in the order of: [nose, left eye,right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist,right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]). The third channel of the last dimension represents the prediction confidence scores of each keypoint, also in the range [0.0, 1.0].

Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Lovelace, NVIDIA Tesla

Supported Operating System(s):

Linux

Model Version(s):

Variation: singlepose-thunder Version 1

Training Dataset:

The models are not trained by NVIDIA.

COCO Keypoint Dataset Training Set 2017: In-the-wild images with diverse scenes, instance sizes, and occlusions. The original training set contains 64k images (images, annotations). The images with three or more people were filtered out, resulting in a 28k final training set.
Active Dataset Training Set: Images sampled from YouTube fitness videos which captures people exercising (e.g. HIIT, weight-lifting, etc.), stretching, or dancing. It contains diverse poses and motion with more motion blur and self-occlusions. The set of images with a single person contains 23.5k images.

Inference:

Engine: Tensor(RT)

Test Hardware: Tested on all supported hardware listed in compatibility section

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

Documentation

Refer to the Vision AI and analytics in the Tokkio documentation for more details.

License

By downloading and using this software, you accept the terms and conditions of this license.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.