VLM Inference Service (Jetson)

NGC Catalog

CLASSIC

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Description

AI Inference Service for using VLM (visual language model) on streaming video for greater contextual understanding and natural language interaction

Publisher

NVIDIA

Latest Tag

2.0.0

Modified

July 1, 2025

Compressed Size

8.57 GB

Multinode Support

Multi-Arch Support

2.0.0 (Latest) Security Scan Results

Linux / arm64

VLM Inference Service

VLMs are multi-modal models supporting images, video and text and using a combination of large language models (LLMs) and vision transformers (ViT). Based on this capability, they are able to support text prompts to query videos and images thereby enabling capabilities such as chatting with the video and defining natural language based alerts.

The VLM AI service, enables quick deployment of VLMs with Jetson Platform Services for video insight applications. The VLM service exposes REST API endpoints to configure the video stream input, set alerts and ask questions in natural language about the input video stream.

Additionally, the output of the VLM can be viewed as an RTSP stream, the alert states are stored by the jetson-monitoring service and sent over a websocket to integrate with other services.

For more information on VLM inference service and using it in application, refer to https://docs.nvidia.com/jetson/jps/inference-services/vlm.html

License

By downloading or using the software and materials, you agree to the License Agreement for Jetson Platform Services