NGC Catalog
CLASSIC
Welcome Guest
Resources
NVILA Finetuning Microservice Getting Started - Early Access

NVILA Finetuning Microservice Getting Started - Early Access

For downloads and more information, please view on a desktop device.
Description
Resource to help get started with NVILA Finetuning Early Access features.
Publisher
NVIDIA
Latest Version
0.3.0-SOP
Modified
July 8, 2025
Compressed Size
2.44 GB

NVILA Finetuning Microservice - Early Access

NVILA is a Vision Language Model developed by NVIDIA that can achieve state of the art image and video understanding. There are many subsequent work from VILA such as VILA^2, Long VILA, and NVILA. The container card walks through the required tools to finetune High Resolution Video NVILA using two popular approaches: LoRA (Low-Rank Adaptation) and Full Finetuning.

NVILA Finetuning MS EA

NVILA FTMS is a visual language model (VLM) finetuning microservice that allows customers to finetune a pre-trained NVILA-Lite-15B high-res video model, with video/image-text data at scale, enabling multi-image and video VLM for user specific downstream use cases.

NVILA FTMS EA package is comprised of:

  • Finetuning Microservice Container containing scripts and APIs to finetune an NVILA-15B-Lite High Res LITA model
  • Sample tutorial notebook to walk-through the end-to-end finetuning workflow for the NVILA-15B-Lite High Res LITA model
  • A pre-trained NVILA-15B-Lite High Res VLM with LITA

Containers

All containers needed to run the finetuning microservice can be pulled from this location. See the list below for all available containers in this registry.

Container Type container_name:tag
NVILA Finetuning Microservice - Early Access nvcr.io/nvidia/tao/vlm-finetuning-ea:0.2.0-ea

Pre-trained Models

Model Name Link
NVILA-Lite-15B-HighRes nvidia/tao/nvila:nvila-lite-15b-highres-lita

Resources

NGC Resource Link
VLM Getting Started - Early Access nvidia/tao/vlm-getting-started-ea:0.2.0-ea

Technical blogs

Access the latest in Vision AI development workflows with NVIDIA TAO Toolkit 5.0

  • NVIDIA Announces Nemotron Model Families to Advance Agentic AI
  • Visual Language Models on NVIDIA Hardware with VILA
  • Vision Language Model Prompt Engineering Guide for Image and Video Understanding
  • Build Multimodal Visual AI Agents Powered by NVIDIA NIM

Suggested reading

More information about TAO Toolkit and pre-trained models can be found at the NVIDIA Developer Zone

  • Vision Language Models
  • NVILA: Efficient Frontier Visual Language Models

Ethical Considerations

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended. Please report security vulnerabilities or NVIDIA AI Concerns here.

Security

Security Vulnerabilities in Open Source Packages Please review the Security Scanning (LINK) tab to view the latest security scan results. For certain open-source vulnerabilities listed in the scan results, NVIDIA provides a response in the form of a Vulnerability Exploitability eXchange (VEX) document. The VEX information can be reviewed and downloaded from the Security Scanning (LINK) tab.