Dynamo Snapshot-Agent

NVIDIA

Container

NVIDIA

Dynamo Snapshot-Agent

Dynamo Snapshot Agent enables CRIU-based checkpoint and restore for GPU inference workloads running on NVIDIA Dynamo

Overview

The Dynamo Snapshot Agent container is a pre-built, Docker-based Kubernetes DaemonSet image that enables CRIU-based checkpoint and restore for GPU inference workloads running on NVIDIA Dynamo. It dramatically reduces cold-start times for large models from minutes to seconds by capturing initialized application state (model loaded on GPU) and restoring it on-demand into new pods. Quick Links: Key Components | Release Info | Getting Started | Support

Experimental Feature: Dynamo Snapshot is currently in preview. The DaemonSet runs in privileged mode to perform CRIU operations. See Limitations for details.

Key Components

CRIU (Checkpoint/Restore in User-space): Process-level checkpoint and restore engine (v4.2) that captures full application state, including memory, file descriptors, and process trees.
NVIDIA cuda-checkpoint: GPU state checkpoint and restore utility that works alongside CRIU to capture and restore CUDA contexts, allocations, and device state.
Snapshot Agent: Go-based DaemonSet binary that watches for checkpoint-source and restore-target pods via Kubernetes labels, orchestrates the CRIU dump and cuda-checkpoint workflows, and writes checkpoint tars to shared storage.
nsrestore: Companion binary that runs inside placeholder containers via nsenter to apply rootfs overlays and execute CRIU and CUDA restore operations.
Kubernetes-Native Workflow: Integrates with the Dynamo Operator via DynamoCheckpoint Custom Resources and pod labels (nvidia.com/snapshot-is-checkpoint-source, nvidia.com/snapshot-is-restore-target) for fully automated checkpoint lifecycle management.
Helm Chart: Namespace-scoped Helm chart installs the DaemonSet, checkpoint storage PVC, RBAC, and seccomp profile. For more information about Dynamo Snapshot, please refer to the Snapshot documentation and the GitHub repository.

Release Info

For the complete release history including CUDA support and architecture details, see the Release Artifacts page. The snapshot agent container is available for x86_64 (AMD64) architecture only (cuda-checkpoint does not have an ARM64 binary).

Getting Started

Select the Tags tab and locate the container image release that you want to run.
In the Pull Tag column, click the icon to copy the docker pull command.
Open a command prompt and paste the pull command. Ensure the pull completes successfully.
Deploy the snapshot agent on your Kubernetes cluster using the Helm chart:

helm upgrade --install snapshot oci://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-snapshot \
  --namespace ${NAMESPACE} \
  --create-namespace \
  --set storage.pvc.create=true

For next steps, including checkpoint configuration, DynamoCheckpoint CRD usage, and end-to-end restore workflows, please refer to the Snapshot guide.

Prerequisites

Dynamo Platform/Operator installed on a Kubernetes cluster with x86_64 (AMD64) GPU nodes
NVIDIA driver 580.xx or newer
containerd runtime
vLLM or SGLang backend (TensorRT-LLM is not supported yet)
ReadWriteMany storage for cross-node restore
Security clearance to run a privileged DaemonSet with hostPID, hostIPC, and hostNetwork

Limitations

LLM workers only: Checkpoint/restore supports LLM decode and prefill workers. Specialized workers (multimodal, embedding, diffusion) are not supported.
Single-GPU only: Multi-GPU configurations may work in basic hardware configurations but are not officially supported yet.
Network state: Active TCP connections cannot be checkpointed.
Architecture: x86_64 (AMD64) only — cuda-checkpoint does not have an ARM64 binary.
Security: Runs as a privileged DaemonSet (required for CRIU and cuda-checkpoint). Workload pods do not need to be privileged.

Support Matrix

Please refer to the support matrix and feature matrix for detailed hardware, architecture, and backend support information.

Backend	Snapshot Support
vLLM	Supported
SGLang	Supported
TensorRT-LLM	Not yet supported

vLLM Runtime — Broadest model and feature coverage
SGLang Runtime — High-throughput optimized backend
TensorRT-LLM Runtime — Maximum inference performance
Dynamo Frontend — Standalone frontend with EndpointPicker (EPP)
Kubernetes Operator — K8s deployment automation

License

NVIDIA Dynamo is released under the Apache-2.0 open-source license, making it freely available for development, research, and deployment.

Technical Support

Documentation: Dynamo Documentation
Snapshot Guide: Snapshot Documentation
GitHub Issues: Dynamo GitHub Issues
Release Notes: GitHub Releases

Publisher

NVIDIA

Latest Tag1.3.0

UpdatedJuly 22, 2026 UTC

Compressed Size4.95 GB

Multinode SupportNo

Multi-Arch SupportYes

System

signed images

Labels

AI Inference Kubernetes Infrastructure NSPECT-9EST-K1WZ

Overview

Key Components

Release Info

Getting Started

Prerequisites

Limitations

Support Matrix

Related Containers

License

Technical Support