Vision-language (VILA) models support single image, multi-image, and video reasoning. VILA models have an augmented series of checkpoints with enhanced vision encoders and large language models (LLMs). New VILA, aka NVILA, is a family of models with an enhanced vision encoder and LLM to improve the code base performance of the previous VILA model.
NVILA-Lite-15B-HighRes is a variant of NVILA-Lite-15B that can process high resolution images and videos up to 1.8K. Common use cases include captioning, visual Q&A, search, and summarization.
NVILA Finetuning Microservices (FTMS) is a visual language model (VLM) finetuning microservice that allows customers to finetune the pre-trained NVILA-Lite-15B-HighRes video model, with video/image-text data at scale. Please see the container card for this offering here.
The model is for research and non-commercial use.
This model has been released under the following governing terms: Deed - Attribution-NonCommercial 4.0 International - Creative Commons. Additional information on licensing for base pretrained models: Gemma Terms of Use | Google AI for Developers for PaliGemma 2, Gemma Prohibited Use Policy | Google AI for Developers, and Apache License, Version 2.0 for Qwen2.5.
Architecture Type: Transformer
Network Architecture: SigLip, Qwen2.5
Input Type: Image, Video, Text
Input Format:
Image: Red, Green, Blue (RGB)
Video: MP4
Text: String
Input Parameters:
Image: 2D
Video: 3D
Text: 1D
Output Type: Text
Output Format: String
Output Parameters: 1D
**Runtime Engine: HF Trainer 4.46.0
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
Linux
NVILA-Lite-15B-High-Res
** Data Collection Method by dataset
** Labeling Method by dataset
**Properties **
60 million image-text pairs or interleaved image-text content.
| Benchmark | Accuracy | | VideoMME w/o Sub @32f | 64.78 | | VideoMME w/ Sub @32f | 67.41 |
Engine:
Test Hardware:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.