Linux / arm64
VLMs are multi-modal models supporting images, video and text and using a combination of large language models (LLMs) and vision transformers (ViT). Based on this capability, they are able to support text prompts to query videos and images thereby enabling capabilities such as chatting with the video and defining natural language based alerts.
The VLM AI service, enables quick deployment of VLMs with Jetson Platform Services for video insight applications. The VLM service exposes REST API endpoints to configure the video stream input, set alerts and ask questions in natural language about the input video stream.
Additionally, the output of the VLM can be viewed as an RTSP stream, the alert states are stored by the jetson-monitoring service and sent over a websocket to integrate with other services.
For more information on VLM inference service and using it in application, refer to https://docs.nvidia.com/jetson/jps/inference-services/vlm.html
By downloading or using the software and materials, you agree to the License Agreement for JetPack.