The core of NVIDIA TensorRT is a C++ library that facilitates high-performance inference on NVIDIA graphics processing units (GPUs). TensorRT takes a trained network, which consists of a network definition and a set of trained parameters, and produces a highly optimized runtime engine that performs inference for that network.
You can describe a TensorRT network using a C++ or Python API, or you can import an existing Caffe, ONNX, or TensorFlow model using one of the provided parsers.
TensorRT provides APIs via C++ and Python that help to express deep learning models via the Network Definition API or load a pre-defined model via the parsers that allows TensorRT to optimize and run them on a NVIDIA GPU. TensorRT applies graph optimizations, layer fusion, among other optimizations, while also finding the fastest implementation of that model leveraging a diverse collection of highly optimized kernels. TensorRT also supplies a runtime that you can use to execute this network on all of NVIDIA's GPUs from the Kepler generation onwards.
TensorRT also includes optional high speed mixed precision capabilities introduced in the Tegra X1, and extended with the Pascal, Volta, and Turing architectures.
Currently only TensorRT runtime container is provided. The TensorRT runtime container image is intended to be used as a base image to containerize and deploy AI applications on Jetson. This container uses l4t-cuda runtime container as the base image. The container includes with in itself the TensorRT runtime componetns and also includes CUDA runtime and CUDA math libraries ; these components does not get mounted from host by NVIDIA container runtime. NVIDIA container rutime still mounts platform specific libraries and select device nodes into the container.
The image is tagged with the version corresponding to the TensorRT release version. Based on this, the l4t-tensorrt:r8.0.1-runtime container is intended to be run on devices running JetPack 4.6 which supports TensorRT version 8.0.1
Ensure that NVIDIA Container Runtime on Jetson is running on Jetson.
Note that NVIDIA Container Runtime is available for install as part of Nvidia JetPack
Before running the l4t-cuda runtime container, use Docker pull to ensure an up-to-date image is installed. Once the pull is complete, you can run the container image.
Procedure
To run the container:
xhost +
sudo docker run -it --rm --net=host --runtime nvidia -e DISPLAY=$DISPLAY -v /tmp/.X11-unix/:/tmp/.X11-unix nvcr.io/nvidia/l4t-tensorrt:r8.0.1-runtime
Option explained:
By default a limited set of device nodes and associated functionality is exposed within the cuda-runtime containers using the mount plugin capability. This list is documented here.
User can expose additional devices using the --device command option provided by docker.
Directories and files can be bind mounted using the -v option.
Note that usage of some devices might need associated libraries to be available inside the container.
Once you have successfully launched the l4t-tensorrt container, you run TensorRT samples inside it. For example, to run TensorRT sampels inside the l4t-tensorrt runtime container, you can mount the TensorRT samples inside the container using -v options (-v ) during "docker run" and then run the TensorRT samples from within the container.
For the latest TensorRT container Release Notes see the TensorRT Container Release Notes website.
For a full list of the supported software and specific versions that come packaged with this framework based on the container image, see the Frameworks Support Matrix.
For the latest TensorRT product Release Notes, Developer and Installation Guides, see the TensorRT Product Documentation website.
By pulling and using the container, you accept the terms and conditions of this End User License Agreement.