Linux / amd64
The Dynamo TensorRT-LLM (tensorrtllm) runtime container is a pre-built, Docker-based environment designed to run NVIDIA Dynamo with the TensorRT-LLM backend for high-performance, distributed large language model (LLM) inference. It packages all necessary dependencies, runtime components, and optimizations to streamline deployment and ensure consistency across development and production environments.
Key Components
TensorRT-LLM Backend: TensorRT-LLM is an open-sourced library for optimizing Large Language Model (LLM) inference. It provides state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs.
Dynamo Core Services: Includes the HTTP API server, request router, and worker processes for prefill and decode phases.
Supporting Services: Integrates with etcd and NATS for distributed coordination and messaging.
OpenAI-Compatible Frontend: Exposes an HTTP API compatible with OpenAI’s endpoints for easy integration.
For more information about Dynamo features, please refer to the Github repository
Select the Tags tab and locate the container image release that you want to run.
In the Pull Tag column, click the icon to copy the docker pull command.
Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.
Start required services (etcd and NATS) using Docker Compose:
docker compose -f deploy/docker-compose.yml up -d
Run the container image and verify dynamo via:
dynamo --version
For more examples, please refer to the examples directory in the repository.
Please refer to the following support matrix to learn more about the current hardware & architecture support. Dynamo currently only provides pre-built x86_64 containers.
NVIDIA Dynamo is released under an open-source license, Apache-2.0, making it freely available for development, research, and deployment.
GitHub Issues: Dynamo GitHub Issues