The Tokkio Vision AI container is a robust video inference solution designed to extract facial bounding boxes and body poses from user video streams. It is implemented using the GXF and Deepstream frameworks. It includes the Movenet model as well.
This model predicts human joint locations of a single person.The model is designed to be run in the browser using Tensorflow.js or on devices using TF Lite in real-time, targeting movement/fitness activities. This variant: MoveNet.SinglePose.Thunder is a higher capacity model (compared to MoveNet.SinglePose.Lightning) that performs better prediction quality while still achieving real-time (<30FPS) speed. Naturally, thunder will lag behind the lightning, but it will pack a punch. This model is not developed by NVIDIA and provided by third-party as is within the container
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. See the Movenet Model Card for more information: https://www.kaggle.com/models/google/movenet.
Terms of use
This model is licensed and released under Apache-2.
You are responsible for ensuring that your use of models complies with all applicable laws.
MobileNetV2 image feature extractor with Feature Pyramid Network decoder (to stride of 4) followed by CenterNet prediction heads with custom post-processing logics. Lightning uses depth multiplier 1.0 while Thunder uses depth multiplier 1.75.
Input Type(s): Image
Input Format(s): Red, Green, Blue (RGB)
Input Parameters: Two Dimensional (2D)
Other Properties Related to Input: A frame of video or an image, represented as an int32 tensor of shape: 192x192x3(Lightning) 256x256x3(Thunder). Channels order: RGB with values in [0, 255].
Output Type(s): Tensor
Output Format: float32
Output Parameters: Shape [1,1,17,3]
Other Properties Related to Output: The first two channels of the last dimension represents the yx coordinates (normalized to image frame, i.e. range in [0.0, 1.0]) of the 17 keypoints (in the order of: [nose, left eye,right eye, left ear, right ear, left shoulder, right shoulder, left elbow, right elbow, left wrist,right wrist, left hip, right hip, left knee, right knee, left ankle, right ankle]). The third channel of the last dimension represents the prediction confidence scores of each keypoint, also in the range [0.0, 1.0].
Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Lovelace, NVIDIA Tesla
Linux
Variation: singlepose-thunder Version 1
The models are not trained by NVIDIA.
Engine: Tensor(RT)
Test Hardware: Tested on all supported hardware listed in compatibility section
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.
Refer to the Vision AI and analytics in the Tokkio documentation for more details.
By downloading and using this software, you accept the terms and conditions of this license.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.