# PointPillars Model Card ## Description: PointPillarNet detects objects from a LIDAR point cloud file. This model is ready for commercial use. ## References: ### Using TAO Pre-trained Models - Get [TAO Container](https://ngc.nvidia.com/catalog/containers/nvidia:tao:tao-toolkit) - Get other purpose-built models from the NGC model registry: - [TrafficCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:trafficcamnet) - [PeopleNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplenet) - [PeopleNet-Transformer](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplenet_transformer) - [DashCamNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:dashcamnet) - [FaceDetectIR](https://ngc.nvidia.com/catalog/models/nvidia:tao:facedetectir) - [VehicleMakeNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:vehiclemakenet) - [VehicleTypeNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:vehicletypenet) - [PeopleSegNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplesegnet) - [PeopleSemSegNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:peoplesemsegnet) - [License Plate Detection](https://ngc.nvidia.com/catalog/models/nvidia:tao:lpdnet) - [License Plate Recognition](https://ngc.nvidia.com/catalog/models/nvidia:tao:lprnet) - [PoseClassificationNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:poseclassificationnet) - [Facial Landmark](https://ngc.nvidia.com/catalog/models/nvidia:tao:fpenet) - [FaceDetect](https://ngc.nvidia.com/catalog/models/nvidia:tao:facenet) - [2D Body Pose Estimation](https://ngc.nvidia.com/catalog/models/nvidia:tao:bodyposenet) - [ActionRecognitionNet](https://ngc.nvidia.com/catalog/models/nvidia:tao:actionrecognitionnet) - [People ReIdentification](https://ngc.nvidia.com/catalog/models/nvidia:tao:reidentificationnet) - [PointPillarNet](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pointpillarnet) - [CitySegFormer](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/citysemsegformer) - [Retail Object Detection](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/retail_object_detection) - [Retail Object Embedding](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/retail_object_recognition) - [Optical Inspection](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/optical_inspection) - [Optical Character Detection](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/ocdnet) - [Optical Character Recognition](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/ocrnet) - [PCB Classification](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/pcb_classification) - [PeopleSemSegFormer](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/models/peoplesemsegformer) ## Model Architecture: **Architecture Type:** Convolution Neural Network (CNN)
**Network Architecture:** PointPillars Architecture
## Input: **Input Type(s):** Point Cloud File
**Input Format(s):** Lidar
**Input Parameters:** Points, Num_Points
**Other Properties Related to Input:**
- Points have the shape (N, P, 4), where N is the batch size, P is the maximum number of points in a point cloud file in the dataset., and N is the number of features per point.
- Num_Points is the actual number of points in each point cloud file. It has the shape (N), where N is the batch size as above. ## Output: **Output Type(s):** Label(s), Bounding-Box(es), Confidence Scores
**Output Format:** Label: Text String(s); Bounding Box: (x-coordinate, y-coordinate, z-coordinate, width, height, depth), Confidence Scores: Floating Point
**Other Properties Related to Output:** Category Label(s): (Vehicle, Pedestrian, Cyclist); Bounding Box Coordinates; Confidence Scores
## Software Integration: **Runtime Engine(s):** * TAO - 5.2
* DeepStream 6.1 or later
**Supported Hardware Architecture(s):**
* Ampere
* Jetson
* Hopper
* Lovelace
* Pascal
* Turing
* Volta
**Supported Operating System(s):**
* Linux
* Linux 4 Tegra
## Model Version(s): - deployable_v1.0 - trainable_v1.0 ## Training & Evaluation: ## Training Dataset: **Data Collection Method by dataset:**
* Automatic/Sensors
**Labeling Method by dataset:**
* Human
**Properties:**
7481 training images of 80,256 labeled objects from a proprietary LIDAR point cloud dataset of vehicles, pedestrians, and cyclists, and other elements of road scenery collected by a solid state LIDAR. ### Methodology and KPI The key performance indicator is the mean average precision(mAP) object detection in 3D or Bird's-Eye View(BEV). The KPI for the evaluation data are reported in the table below. | Model | Dataset | mAP BEV/3D | | ---- | ---- | ---- | | `pointpillars_trainable.tlt` | proprietary dataset |65.2167%/51.7159%| | `pointpillars_deployable.etlt` | proprietary dataset | 66.6860%/52.8530% | ## Evaluation Dataset: **Data Collection Method by dataset:**
* Automatic/Sensors
**Labeling Method by dataset:**
* Human
**Properties:**
7581 training images of 80,256 labeled objects from a proprietary LIDAR point cloud dataset of vehicles, pedestrians, and cyclists, and other elements of road scenery collected by a solid state LIDAR. ## Inference: **Engine:** Tensor(RT)
**Test Hardware:**
- Jetson AGX Xavier - Xavier NX - Orin - Orin NX - NVIDIA T4 - Ampere GPU - A2 - A30 - L4 - T4 - DGX H100 - DGX A100 - DGX H100 - L40 - JAO 64GB - Orin NX16GB - Orin Nano 8GB The inference is run on the provided deployable models at FP16 precision. The Jetson devices run at Max-N configuration for maximum system performance. The performance shown below is only for inference of the usa deployable(pruned) model. As a comparison, we also show the inference performance of the unpruned model(not available here). | Model | Device | Precision | Batch_size | FPS | | ---- | ---- | ---- | ---- | --- | | Pruned | Xavier | FP16 | 1 | 39 | | Unpruned | Xavier | FP16 | 1 | 31 | ## How to use this model These models need to be used with NVIDIA Hardware and Software. For Hardware, the models can run on any NVIDIA GPU including NVIDIA Jetson devices. These models can only be used with [Train Adapt Optimize (TAO) Toolkit](https://developer.nvidia.com/tao-toolkit), or [TensorRT](https://developer.nvidia.com/tensorrt). Primary use case intended for these models is detecting objects in a point cloud file. Totally there are two models provided: - `pointpillars_trainable.tlt` - `pointpillars_deployable.etlt` The `trainable` models are intended for training and fine-tuning using TAO Toolkit along with the user's dataset of point cloud. High fidelity models can be trained and adapted to the use case. The usa `deployable` models are intended for easy deployment to the edge using TensorRT. The `trainable` and `deployable` models are encrypted and can be decrypted with the following key: - Model load key: `tlt_encode` Please make sure to use this as the key for all TAO commands that require a model load key. ### Instructions to use trainable model with TAO Toolkit In order to use these models as a pre-trained model for transfer learning, please use the snippet below as template for the `OPTIMIZATION` component of the config file to train a PointPillars model. For more information on the config file, please refer to the [TAO Toolkit User Guide](https://docs.nvidia.com/tao/tao-toolkit/index.html). 1. Set `PRETRAINED_MODEL_PATH` in `OPTIMIZATION` parameter ```py PRETRAINED_MODEL_PATH: "/path/to/the/model.tlt" ``` ### Instructions to deploy these models with TensorRT PointPillars model can be deployed in TensorRT with the [TensorRT C++ sample](https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes/tree/main/tao_pointpillars/tensorrt_sample) with TensorRT 8.2. As a dependency, the TensorRT sample requires the [TensorRT OSS 22.02](https://github.com/NVIDIA/TensorRT/tree/22.02) to be installed. Detailed steps are shown below. * Install TensorRT 8.2 or use pre-installed one if it is already installed. * Install TensorRT OSS 22.02. ``` git clone -b 22.02 https://github.com/NVIDIA/TensorRT.git TensorRT cd TensorRT git submodule update --init --recursive mkdir -p build && cd build cmake .. -DCUDA_VERSION=$CUDA_VERSION -DGPU_ARCHS=$GPU_ARCHS make nvinfer_plugin -j$(nproc) cp libnvinfer_plugin.so.8.2.* /usr/lib/x86_64-linux-gnu/libnvinfer_plugin.so.8.2.3 cp libnvinfer_plugin_static.a /usr/lib/x86_64-linux-gnu/libnvinfer_plugin_static.a ``` * Train the model in TAO Toolkit and export to the `.etlt` model. * Generate TensorRT engine on target device with `tao-converter`. ``` tao-converter -k $KEY \ -e $USER_EXPERIMENT_DIR/trt.fp16.engine \ -p points,1x204800x4,1x204800x4,1x204800x4 \ -p num_points,1,1,1 \ -t fp16 \ pointpillars_deployable.etlt ``` * Clone, build and run the C++ sample. ``` cd ~ git clone https://github.com/NVIDIA-AI-IOT/tao_toolkit_recipes.git cd tao_toolkit_recipes git lfs pull cd tao_pointpillars/tensorrt_sample/test mkdir build cd build cmake .. -DCUDA_VERSION= make -j8 ./pointpillars -e /path/to/tensorrt/engine -l ../../data/102.bin -t 0.01 -c Vehicle,Pedestrain,Cyclist -n 4096 -p -d fp16 ``` ## Technical blogs - Read the 2 part blog on training and optimizing 2D body pose estimation model with TAO - [Part 1](https://developer.nvidia.com/blog/training-optimizing-2d-pose-estimation-model-with-tao-toolkit-part-1) | [Part 2](https://developer.nvidia.com/blog/training-optimizing-2d-pose-estimation-model-with-tao-toolkit-part-2) - Learn how to train [real-time License plate detection and recognition app](https://developer.nvidia.com/blog/creating-a-real-time-license-plate-detection-and-recognition-app) with TAO and DeepStream. - Model accuracy is extremely important, learn how you can achieve [state of the art accuracy for classification and object detection models](https://developer.nvidia.com/blog/preparing-state-of-the-art-models-for-classification-and-object-detection-with-tao-toolkit/) using TAO - Learn how to train [Instance segmentation model using MaskRCNN with TAO](https://developer.nvidia.com/blog/training-instance-segmentation-models-using-maskrcnn-on-tao-toolkit/) - Learn how to improve INT8 accuracy using [Quantization aware training(QAT) with TAO](https://developer.nvidia.com/blog/improving-int8-accuracy-using-quantization-aware-training-and-tao-toolkit/) - Read the technical tutorial on how [PeopleNet model can be trained with custom data using Transfer Learning Toolkit](https://devblogs.nvidia.com/training-custom-pretrained-models-using-tlt/) - Learn how to [train and deploy real-time intelligent video analytics apps and services using DeepStream SDK](https://devblogs.nvidia.com/building-iva-apps-using-deepstream-5.0/) ## Suggested reading - More information on about TAO Toolkit and pre-trained models can be found at the [NVIDIA Developer Zone](https://developer.nvidia.com/tao-toolkit) - Read the [TAO getting Started](https://docs.nvidia.com/tao/tao-toolkit/) guide and [release notes](https://docs.nvidia.com/tao/tao-toolkit/text/release_notes.html). - If you have any questions or feedback, please refer to the discussions on [TAO Toolkit Developer Forums](https://forums.developer.nvidia.com/c/accelerated-computing/intelligent-video-analytics/tao-toolkit/17) - Deploy your model on the edge using DeepStream. Learn more about DeepStream SDK https://developer.nvidia.com/deepstream-sdk ## Ethical Considerations: NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards.