NGC Catalog
CLASSIC
Welcome Guest
Collections
NVIDIA Dynamo

NVIDIA Dynamo

For contents of this collection and more information, please view on a desktop device.
Logo for NVIDIA Dynamo
Description
NVIDIA Dynamo is a high-throughput low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments.
Curator
NVIDIA
Modified
July 2, 2025
Containers
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Models
Sorry, your browser does not support inline SVG.
Resources
Sorry, your browser does not support inline SVG.

Overview

The NVIDIA Dynamo Platform is a high-performance, low-latency inference platform designed to serve all AI models across any framework, architecture, or deployment scale. Whether you're running image recognition on a single entry-level GPU or deploying billion-parameter reasoning large language models (LLMs) across hundreds of thousands of data center GPUs, the NVIDIA Dynamo Platform delivers scalable, efficient AI inference.

The NVIDIA Dynamo Collection includes:

  • Dynamo vLLM Runtime, a pre-built, Docker-based environment designed to run NVIDIA Dynamo with the vLLM inference engine for high-performance, distributed large language model (LLM) inference
  • Dynamo Operator, a Kubernetes operator designed to automate and simplify the deployment, configuration, and lifecycle management of NVIDIA Dynamo inference graphs (also called pipelines) in cloud-native environments
  • Dynamo Deployment API, a container that provides the Dynamo API store, a service that stores and manages service configurations, metadata and artifacts.

Getting Started with NVIDIA Dynamo

For getting started with NVIDIA Dynamo, please refer to our documentation.

License

NVIDIA Dynamo is released under an open-source license, Apache-2.0, making it freely available for development, research, and deployment.