NGC Catalog
CLASSIC
Welcome Guest
Containers
Dynamo vLLM Runtime

Dynamo vLLM Runtime

For copy image paths and more information, please view on a desktop device.
Logo for Dynamo vLLM Runtime
Description
The Dynamo vLLM runtime image is a containerized build of Dynamo + vLLM which serves as the base runtime environment for vLLM based inference with Dynamo's distributed inference framework.
Publisher
NVIDIA
Latest Tag
0.3.1
Modified
July 2, 2025
Compressed Size
9.21 GB
Multinode Support
No
Multi-Arch Support
No
0.3.1 (Latest) Security Scan Results
No results available.

Overview

The Dynamo VLLM runtime container is a pre-built, Docker-based environment designed to run NVIDIA Dynamo with the vLLM backend for high-performance, distributed large language model (LLM) inference. It packages all necessary dependencies, runtime components, and optimizations to streamline deployment and ensure consistency across development and production environments.

Key Components

  • vLLM Backend: Provides fast, efficient LLM inference, leveraging vLLM’s optimized attention and KV cache management.

  • Dynamo Core Services: Includes the HTTP API server, request router, and worker processes for prefill and decode phases.

  • Supporting Services: Integrates with etcd and NATS for distributed coordination and messaging.

  • OpenAI-Compatible Frontend: Exposes an HTTP API compatible with OpenAI’s endpoints for easy integration.

For more information about Dynamo features, please refer to the Github repository

Getting Started

  • Select the Tags tab and locate the container image release that you want to run.

  • In the Pull Tag column, click the icon to copy the docker pull command.

  • Open a command prompt and paste the pull command. The pulling of the container image begins. Ensure the pull completes successfully before proceeding to the next step.

  • Start required services (etcd and NATS) using Docker Compose:

    docker compose -f deploy/docker-compose.yml up -d

  • Run the container image and start dynamo via:

    dynamo run out=vllm deepseek-ai/DeepSeek-R1-Distill-Llama-8B

    For more examples, please refer to the examples directory in the repository.

Support Matrix

Please refer to the following support matrix to learn more about the current hardware & architecture support. Dynamo currently only provides pre-built x86_64 containers.

License

NVIDIA Dynamo is released under an open-source license, Apache-2.0, making it freely available for development, research, and deployment.

Technical Support

GitHub Issues: Dynamo GitHub Issues