3D Fusion Model Card
The model described in this card detects the people within an image and a Lidar with 3D bounding boxes. It also provides a category label for each object.
This model is based on BEVFusion, which unifies the feature representation from different modalities (Lidar and image). This unified feature can be used for 3D Object Detection tasks.
The BEVFusion codebase from mmdet3d was used to train this model. The code was modified to accomodate three angles in 3D space (roll, pitch ,yaw), whereas only yaw was supported in the original code base. The training algorithm optimizes the network to minimize the Gaussian Focal Loss and L1 Loss.
A proprietary synthetic dataset that was generated with Omniverse and Issac-Sim was used during the training. The training dataset has 465,373 images, point cloud pairs, and 1,510,704 objects for the person class. The training dataset consists of image, point cloud data from Lidar, and calibration matrices from nine different indoor scenes. The content was captured from a height of two feet with sensors that were located perpendicularly from the surface. Each scene has three different lighting conditions: Normal, Dark, and Very Dark.
|Number of Images
Methodology and KPI
The KPI for the evaluation data are reported in the following table. Evaluation data is also generated with Omniverse. The model was evaluated with the KITTI 3D Metric, which evaluates object detection performance using mean Average Precision (mAP) at IOU 0.5.
RGB Image : 1920 X 1080 X 3 LiDAR Point Cloud : N x 4 (N: nubmer of points, 4: xyz+intensity) Calibration matrix : Lidar to Camera prjoection matrix, Camera Intrinsic matrix
Category labels (people) and 3D bounding-box coordinates in nine dimension representation (cetner_x,cetner_y,center_z, scale_x, scale_y, scale_z, rotation_x, rotation_y, rotation_z) for each detected people in the input image.
- Liu, Zhijian and Tang, Haotian and Amini, Alexander and Yang, Xingyu and Mao, Huizi and Rus, Daniela and Han, Song: BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation. In: ICRA. (2023)
- Access the latest in Vision AI development workflows with NVIDIA TAO Toolkit 5.0
- Improve accuracy and robustness of vision AI models with vision transformers and NVIDIA TAO
- Developing and Deploying AI-powered Robots with NVIDIA Isaac Sim and NVIDIA TAO
- Learn endless ways to adapt and supercharge your AI workflows with TAO - Whitepaper
- More information on about TAO Toolkit and pre-trained models can be found at the NVIDIA Developer Zone
- TAO documentation
- Read the TAO getting Started guide and release notes.
- If you have any questions or feedback, see the discussions on TAO Toolkit Developer Forums
- Learn more about Issac-Sim
License to use these models is covered by the Model EULA. By downloading the model, you accept the terms and conditions of these licenses.
NVIDIA 3D Fusion model detects people .
NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.