Model
Cosmos Tokenizer is a suite of visual tokenizers for images and videos that delivers various compression rates while maintaining high reconstruction quality.
Sign in to access this content
| Field | Response |
|---|---|
| Intended Application & Domain: | Tokenization of images and videos |
| Model Type: | Auto-Encoder |
| Intended Users: | Generative AI developers for image and video generation models |
| Output: | Images/Videos and Latent Tokens |
| Describe how the model works: | Compresses and decompresses visual input (image/video). |
| Technical Limitations: | Due to tokenizer compression limitations, some visual information (such as small text and other structured fine details) may not be reconstructed accurately. The tokenizers may not produce as high of a reconstruction results for videos with low resolution, e.g. less than 320p. |
| Verified to have met prescribed NVIDIA quality standards: | Yes |
| Performance Metrics: | Peak Signal-to-Noise Ratio (PSNR), Structural Similarity (SSIM), Reconstruction Fréchet Video Distance (rFVD), Reconstruction Fréchet Inception Distance (rFID), Latency |
| Potential Known Risks: | Tokenizer's output can parse all forms of input, including what may be considered toxic, offensive, or indecent. |
| Licensing: | NVIDIA Open Model License |