Linux / amd64
Linux / arm64
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and support state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in performant way.
Visit the official GitHub repository for more information: https://github.com/NVIDIA/TensorRT-LLM.
If you have Docker 19.03 or later, a typical command to launch the container is:
docker run --gpus all -it --rm --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorrt-llm/release:x.xx.x
If you have Docker 19.02 or earlier, a typical command to launch the container is:
nvidia-docker run -it --rm -v --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 nvcr.io/nvidia/tensorrt-llm/release:x.xx.x
where: x.xx.x is the container version. For example, 0.20.0. TensorRT-LLM binaries are installed at /app/tensorrt_llm, and users can import it as a Python module:
$ python
>>> import tensorrt_llm
To develop TensorRT-LLM using docker, users can mount the code directory into the develop container.
# Clone the repository
git clone git@github.com:NVIDIA/TensorRT-LLM.git <repo>
# Launch the docker container
nvidia-docker run -it --rm -v --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
--mount type=bind,source=${path},target=${path} \
nvcr.io/nvidia/tensorrt-llm/release:x.xx.x
After getting in the container, users can build the wheel and make their modifications effective.
# Build the TensorRT-LLM code.
python3 ./scripts/build_wheel.py
Refer to the documentation https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html#build-tensorrt-llm for more information.
Refer to the documentation: https://nvidia.github.io/TensorRT-LLM/latest/installation/build-from-source-linux.html#building-a-tensorrt-llm-docker-image for the detailed instructions. Suggested Reading For the latest release notes, see the TensorRT-LLM Release Notes.
For a full list of supported models and hardwares, see the Support Matrix.
For more information about TensorRT-LLM, including documentation, source code and examples, see: https://github.com/NVIDIA/TensorRT-LLM https://nvidia.github.io/TensorRT-LLM/latest/index.html
To review known CVEs on this image, refer to the Security Scanning tab on this page.
By pulling and using the container, you accept the terms and conditions of this End User License Agreement and Product-Specific Terms.