A family of highly performant pre-trained models purpose-built for generating physics-aware videos and world states for physical AI development.
Cosmos Autoregressive: The Cosmos autoregressive models are a collection of pre-trained models that are ideal for predicting and rapidly generating video sequences from video or image inputs for physical AI. They can serve as the building block for various applications or research that are related to video generation. The models are ready for commercial use under NVIDIA Open Model license agreement.
Cosmos Diffusion: The Cosmos diffusion models are a collection of diffusion based world foundation models that generate dynamic, high quality videos from text, image, or video inputs. They can serve as the building block for various applications or research that are related to generation of video data to train Physical AI systems. The models are ready for commercial use under NVIDIA Open Model license agreement.
Model Developer: NVIDIA
This models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.
Under the NVIDIA Open Model License, NVIDIA confirms:
Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate.
Autoregressive Models:
Cosmos-1.0-Autoregressive-4B
4 Billion parameter autoregressive model that generates high-fidelity physics-aware videos from simple video inputs.
Cosmos-1.0-Autoregressive-5B-Video2World
5 Billion parameter autoregressive model that generates and predicts detailed video states from video+text inputs.
Cosmos-1.0-Autoregressive-12B
12 Billion parameter autoregressive model that generates high-fidelity physics-aware videos from simple video inputs.
Cosmos-1.0-Autoregressive-13B-Video2World
13 Billion parameter autoregressive model that generates and predicts detailed video states from video+text inputs.
Diffusion Models:
Cosmos-1.0-Diffusion-7B-Text2World
7 Billion parameter diffusion model that generates physics-aware videos from text prompts.
Cosmos-1.0-Diffusion-7B-Video2World
7 Billion parameter diffusion model that converts video inputs into real-world simulation outputs.
Cosmos-1.0-Diffusion-14B-Text2World
14 Billion parameter diffusion models that generates physics-aware videos from text prompts.
Cosmos-1.0-Diffusion-14B-Video2World
14 Billion parameter diffusion models for using video inputs to real-world simulation outputs.
Cosmos-1.0-Tokenizer-CV8x8x8
A continuous video tokenizer with a compression rate of 8x temporally and 8x8 spatially, with 121 temporal frames context.
Cosmos Supporting Models:
Cosmos-1.0-Guardrail
State of the art guardrail small models to ensure safety and consistency in world models.
Cosmos-1.0-PromptUpsampler-12B-Text2World
12 billion parameter neural network to enhance prompt-driven quality through improving the text prompt descriptions and details automatically.
Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8
Decodes autoregressive video sequences using a 7B parameter model for augmented reality.