NGC Catalog
CLASSIC
Welcome Guest
Models
Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8

Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8

For downloads and more information, please view on a desktop device.
Logo for Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8
Description
Cosmos Diffusion Decoder is a diffusion decoder model that can improve outputs of Cosmos-1.0-Autoregressive models with more fine-grained details.
Publisher
NVIDIA
Latest Version
1.0
Modified
January 7, 2025
Size
13.48 GB

Cosmos-1.0 Diffusion Decoder

Cosmos | Code | Paper

Model Overview

Description:

Cosmos Diffusion Decoder is a diffusion decoder model that can improve outputs of Cosmos-1.0-Autoregressive models with more fine-grained details. This model is ready for commercial use.

Model Developer: NVIDIA

Model Versions

In Cosmos 1.0 release, the Cosmos Diffusion Decoder includes the following models:

  • Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8

License:

This model is released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.

Under the NVIDIA Open Model License, NVIDIA confirms:

  • Models are commercially usable.
  • You are free to create and distribute Derivative Models.
  • NVIDIA does not claim ownership to any outputs generated using the Models or Derivative Models.

Important Note: If you bypass, disable, reduce the efficacy of, or circumvent any technical limitation, safety guardrail or associated safety guardrail hyperparameter, encryption, security, digital rights management, or authentication mechanism contained in the Model, your rights under NVIDIA Open Model License Agreement will automatically terminate.

  • Cosmos-1.0-Guardrail is the safety guardrail for this model.

Model Architecture:

Cosmos-1.0-Diffusion-7B-Decoder-DV8x16x16ToCV8x8x8 is a diffusion transformer model designed for video denoising within the latent space. The network is composed of interleaved self-attention, cross-attention and feedforward layers as its building blocks. The cross-attention layers allow the model to condition on input text throughout the denoising process. Before each layers, adaptive layer normalization is applied to embed the time information for denoising.

Input/Output Specifications

  • Input
    • Input Type(s): Tokens
    • Input Format(s): Integer Tensor
    • Input Parameters: Three-dimensional (3D)
    • Other Properties Related to Input:
      • Integer indices ranging from 0 to 63,999
      • Should be the tokens generated by Cosmos-1.0-Tokenizer-DV8x16x16 or Cosmos-1.0-Autoregressive models
  • Output
    • Output Type(s): Tokens
    • Output Format(s): Float Tensor
    • Output Parameters: Three-dimensional (3D)
    • Other Properties Related to Output:
      • Continuous-valued feature vectors with a dimensionality of 16
      • The output tokens can be used as input for the decoder of Cosmos-1.0-Tokenizer-CV8x8x8

Software Integration

Runtime Engine(s):

  • Cosmos

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Blackwell
  • NVIDIA Hopper
  • NVIDIA Ampere

Note: We have only tested doing inference with BF16 precision.

Operating System(s):

  • Linux (We have not tested on other operating systems.)

Usage

  • See Cosmos for details.

Evaluation

Please see our technical paper for detailed evaluations.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the subcards of Explainability, Bias, Safety & Security, and Privacy below. Please report security vulnerabilities or NVIDIA AI Concerns here.