NVIDIA
NVIDIA
Dynamo Frontend
Container
NVIDIA
NVIDIA
Dynamo Frontend

The Dynamo frontend image is a framework-less image which contains core Dynamo components along with Endpoint Picker (EPP) for Gateway API Inference Extension (GAIE).

Overview

The Dynamo Frontend container is a lightweight, framework-less image designed to deploy and run CPU-bound frontend components without requiring CUDA or backend engine dependencies (vLLM, SGLang, TensorRT-LLM). It enables flexible deployment topologies by separating the frontend from inference backends, and integrates with the Gateway API Inference Extension (GAIE) for Kubernetes-native request routing.
Quick Links: Key Components | Release Info | Getting Started | Support

Key Components

  • OpenAI-Compatible Frontend: HTTP API server compatible with OpenAI's chat completions and completions endpoints, handling request preprocessing, validation, and response formatting.
  • Endpoint Picker (EPP): InferenceScheduler with routing, flow control, and request management for intelligent backend selection. Integrates with Gateway API Inference Extension (GAIE) for Kubernetes-native load balancing.
  • Request Router: Routes requests to appropriate backend workers based on prefix matching, load, and KV cache state.
  • Mock Workers: Test Dynamo components without GPU backends for development, CI/CD, and validation workflows.
  • Kubernetes-Native Infrastructure: Service discovery via EndpointSlices and transport-agnostic request plane (TCP default) enable deployment without etcd or NATS dependencies.
    For more information about Dynamo frontend and GAIE, please refer to the GitHub repository and GAIE documentation.

Release Info

For the complete release history including architecture details, see the Release Artifacts page.
Pre-built containers are available for both x86_64 (AMD64) and ARM64 architectures.

Getting Started

  1. Select the Tags tab and locate the container image release that you want to run.
  2. In the Pull Tag column, click the icon to copy the docker pull command.
  3. Open a command prompt and paste the pull command. Ensure the pull completes successfully.
  4. Run the container:
docker run -it nvcr.io/nvidia/ai-dynamo/dynamo-frontend:<version>

For next steps, including deployment options and examples, please refer to the Dynamo README.

Use Cases

  • Separated Frontend Deployment: Run frontend on CPU nodes while backends run on GPU nodes
  • Gateway API Integration: Use with GAIE for Kubernetes-native inference routing
  • Development & Testing: Test Dynamo pipelines with mock workers without GPU resources
  • CI/CD Validation: Validate configurations and routing logic in automated pipelines

Support Matrix

Please refer to the support matrix for detailed hardware and architecture support.

Related Containers

License

NVIDIA Dynamo is released under the Apache-2.0 open-source license, making it freely available for development, research, and deployment.

Technical Support

Publisher
NVIDIA
NVIDIA
Latest Tag1.2.1
UpdatedJune 13, 2026 UTC
Compressed Size3.33 GB
Multinode SupportNo
Multi-Arch SupportYes

NVIDIA uses cookies to improve your experience on our web site. We and our third-party partners also use cookies and other tools to collect and record information you provide as well as information about your interactions with our websites for performance improvement, analytics, and to assist in marketing efforts. By clicking "Accept All", you consent to our use of cookies and other tools as described in our Cookie Policy. You can manage your cookie settings by clicking on "Manage Settings." By continuing to use this site or by clicking one of the buttons below, you agree to our Terms of Service (which contains important waivers). Please see our Privacy Policy for more information on our privacy practices.