NGC Catalog
CLASSIC
Welcome Guest
Containers
Cosmos Predict2 Container

Cosmos Predict2 Container

For copy image paths and more information, please view on a desktop device.
Logo for Cosmos Predict2 Container
Description
The container image to run inference and post training on Cosmos-Predict2.
Publisher
NVIDIA
Latest Tag
1.0
Modified
June 11, 2025
Compressed Size
12.32 GB
Multinode Support
No
Multi-Arch Support
No
1.0 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

NVIDIA Cosmos

Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason:

Predict Transfer Reason
Type World Generation Multi-Controlnet Reasoning VLM
Function Predict novel future frames given initial frames Transfer existing control frames into photoreal frames within a video clip Reason against frames within a video clip
Use Cases Data Generation & Policy Evaluation Data Augmentation Data Curation
Inputs Text, Image, Video Multiple Video Modalities such as RGB, Depth, Segmentation, and more. Video & Text
Outputs Video Video Text

Product Website | Hugging Face | Paper | Paper Website

Cosmos-Predict2 is a key branch of Cosmos World Foundation Models (WFMs) specialized for future state prediction, often referred to as world models. The three main branches of Cosmos WFMs are cosmos-predict, cosmos-transfer, and cosmos-reason. We visualize the architecture of Cosmos-Predict2 in the following figure.

Key Features

Cosmos-Predict2 includes the following:

  • Diffusion-based world foundation models for Text2Image and Video2World generation, where a user can generate visual simulation based on text prompts or video prompts.

System Requirements

Cosmos-Predict2 has the following system requirements:

  • NVIDIA GPUs with Ampere architecture (RTX 30 Series, A100) or newer architectures. For detailed hardware requirements and recommendations, please refer to our performance benchmarks.
  • Linux operating system (Ubuntu 20.04, 22.04, or 24.04 LTS)
  • CUDA version 12.4 or later
  • Python version 3.10 or later

Download checkpoints

  1. Generate a Hugging Face access token (if you haven't done so already). Set the access token to Read permission (default is Fine-grained).

  2. Log in to Hugging Face with the access token:

    huggingface-cli login
    
  3. Accept the Llama-Guard-3-8B terms

  4. Download the Cosmos model weights from Hugging Face:

    CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 2B 14B --model_types Text2Image --checkpoint_dir checkpoints
    CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 2B 14B --model_types Video2World --checkpoint_dir checkpoints
    

Inference with pre-trained Cosmos-Predict2 models

  • Inference with diffusion-based Text2Image models
  • Inference with diffusion-based Video2World models

Models

Cosmos-Predict2 include the following models

  • Cosmos-Predict2-2B-Text2Image: Text to image generation
  • Cosmos-Predict2-14B-Text2Image: Text to image generation
  • Cosmos-Predict2-2B-Video2World: Video + Text based future visual world generation
  • Cosmos-Predict2-14B-Video2World: Video + Text based future visual world generation

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

This model includes safety and content moderation features powered by Llama Guard 3. Llama Guard 3 is used solely as a content input filter and is subject to its own license.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.