Linux / amd64
The Dynamo SGLang runtime container is a pre-packaged, Docker-based environment tailored for running NVIDIA Dynamo with the SGLang backend for high-performance, modular large language model (LLM) inference and serving. It packages all necessary dependencies, runtime components, and optimizations to streamline deployment and ensure consistency across development and production environments.
Key Components
SGLang Backend: SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language.
Dynamo Core Services: Includes the HTTP API server, request router, and worker processes for prefill and decode phases.
Supporting Services: Integrates with etcd and NATS for distributed coordination and messaging.
OpenAI-Compatible Frontend: Exposes an HTTP API compatible with OpenAI’s endpoints for easy integration.
For more information about Dynamo features, please refer to the Github repository
Select the Tags tab and locate the container image release that you want to run.
In the Pull Tag column, click the icon to copy the docker pull command.
Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.
Start required services (etcd and NATS) using Docker Compose:
docker compose -f deploy/docker-compose.yml up -d
Run the container image and verify dynamo via
dynamo --version
For more examples, please refer to the examples directory in the repository.
Please refer to the following support matrix to learn more about the current hardware & architecture support. Dynamo currently only provides pre-built x86_64 containers.
NVIDIA Dynamo is released under an open-source license, Apache-2.0, making it freely available for development, research, and deployment.
GitHub Issues: Dynamo GitHub Issues