Linux / amd64
Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason:
Predict | Transfer | Reason | |
---|---|---|---|
Type | World Generation | Multi-Controlnet | Reasoning VLM |
Function | Predict novel future frames given initial frames | Transfer existing control frames into photoreal frames within a video clip | Reason against frames within a video clip |
Use Cases | Data Generation & Policy Evaluation | Data Augmentation | Data Curation |
Inputs | Text, Image, Video | Multiple Video Modalities such as RGB, Depth, Segmentation, and more. | Video & Text |
Outputs | Video | Video | Text |
Cosmos-Predict2 is a key branch of Cosmos World Foundation Models (WFMs) specialized for future state prediction, often referred to as world models. The three main branches of Cosmos WFMs are cosmos-predict, cosmos-transfer, and cosmos-reason. We visualize the architecture of Cosmos-Predict2 in the following figure.
Cosmos-Predict2 includes the following:
Cosmos-Predict2 has the following system requirements:
Generate a Hugging Face access token (if you haven't done so already). Set the access token to Read
permission (default is Fine-grained
).
Log in to Hugging Face with the access token:
huggingface-cli login
Accept the Llama-Guard-3-8B terms
Download the Cosmos model weights from Hugging Face:
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 2B 14B --model_types Text2Image --checkpoint_dir checkpoints
CUDA_HOME=$CONDA_PREFIX PYTHONPATH=$(pwd) python scripts/download_diffusion_checkpoints.py --model_sizes 2B 14B --model_types Video2World --checkpoint_dir checkpoints
Cosmos-Predict2 include the following models
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
This model includes safety and content moderation features powered by Llama Guard 3. Llama Guard 3 is used solely as a content input filter and is subject to its own license.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.