Dynamo Snapshot Agent enables CRIU-based checkpoint and restore for GPU inference workloads running on NVIDIA Dynamo
Overview
The Dynamo Snapshot Agent container is a pre-built, Docker-based Kubernetes DaemonSet image that enables CRIU-based checkpoint and restore for GPU inference workloads running on NVIDIA Dynamo. It dramatically reduces cold-start times for large models from minutes to seconds by capturing initialized application state (model loaded on GPU) and restoring it on-demand into new pods.
Quick Links: Key Components | Release Info | Getting Started | Support
Experimental Feature: Dynamo Snapshot is currently in preview. The DaemonSet runs in privileged mode to perform CRIU operations. See Limitations for details.
Key Components
- CRIU (Checkpoint/Restore in User-space): Process-level checkpoint and restore engine (v4.2) that captures full application state, including memory, file descriptors, and process trees.
- NVIDIA cuda-checkpoint: GPU state checkpoint and restore utility that works alongside CRIU to capture and restore CUDA contexts, allocations, and device state.
- Snapshot Agent: Go-based DaemonSet binary that watches for checkpoint-source and restore-target pods via Kubernetes labels, orchestrates the CRIU dump and cuda-checkpoint workflows, and writes checkpoint tars to shared storage.
- nsrestore: Companion binary that runs inside placeholder containers via nsenter to apply rootfs overlays and execute CRIU and CUDA restore operations.
- Kubernetes-Native Workflow: Integrates with the Dynamo Operator via
DynamoCheckpointCustom Resources and pod labels (nvidia.com/snapshot-is-checkpoint-source,nvidia.com/snapshot-is-restore-target) for fully automated checkpoint lifecycle management. - Helm Chart: Namespace-scoped Helm chart installs the DaemonSet, checkpoint storage PVC, RBAC, and seccomp profile. For more information about Dynamo Snapshot, please refer to the Snapshot documentation and the GitHub repository.
Release Info
For the complete release history including CUDA support and architecture details, see the Release Artifacts page. The snapshot agent container is available for x86_64 (AMD64) architecture only (cuda-checkpoint does not have an ARM64 binary).
Getting Started
- Select the Tags tab and locate the container image release that you want to run.
- In the Pull Tag column, click the icon to copy the
docker pullcommand. - Open a command prompt and paste the pull command. Ensure the pull completes successfully.
- Deploy the snapshot agent on your Kubernetes cluster using the Helm chart:
For next steps, including checkpoint configuration, DynamoCheckpoint CRD usage, and end-to-end restore workflows, please refer to the Snapshot guide.
Prerequisites
- Dynamo Platform/Operator installed on a Kubernetes cluster with x86_64 (AMD64) GPU nodes
- NVIDIA driver 580.xx or newer
- containerd runtime
- vLLM or SGLang backend (TensorRT-LLM is not supported yet)
ReadWriteManystorage for cross-node restore- Security clearance to run a privileged DaemonSet with
hostPID,hostIPC, andhostNetwork
Limitations
- LLM workers only: Checkpoint/restore supports LLM decode and prefill workers. Specialized workers (multimodal, embedding, diffusion) are not supported.
- Single-GPU only: Multi-GPU configurations may work in basic hardware configurations but are not officially supported yet.
- Network state: Active TCP connections cannot be checkpointed.
- Architecture: x86_64 (AMD64) only — cuda-checkpoint does not have an ARM64 binary.
- Security: Runs as a privileged DaemonSet (required for CRIU and cuda-checkpoint). Workload pods do not need to be privileged.
Support Matrix
Please refer to the support matrix and feature matrix for detailed hardware, architecture, and backend support information.
| Backend | Snapshot Support |
|---|---|
| vLLM | Supported |
| SGLang | Supported |
| TensorRT-LLM | Not yet supported |
Related Containers
- vLLM Runtime — Broadest model and feature coverage
- SGLang Runtime — High-throughput optimized backend
- TensorRT-LLM Runtime — Maximum inference performance
- Dynamo Frontend — Standalone frontend with EndpointPicker (EPP)
- Kubernetes Operator — K8s deployment automation
License
NVIDIA Dynamo is released under the Apache-2.0 open-source license, making it freely available for development, research, and deployment.
Technical Support
- Documentation: Dynamo Documentation
- Snapshot Guide: Snapshot Documentation
- GitHub Issues: Dynamo GitHub Issues
- Release Notes: GitHub Releases