NGC Catalog
CLASSIC
Welcome Guest
Helm Charts
AI Blueprint for Video Search and Summarization

AI Blueprint for Video Search and Summarization

For versions and more information, please view on a desktop device.
Logo for AI Blueprint for Video Search and Summarization
Description
Blueprint for the Video Search and Summarization Agent
Publisher
NVIDIA
Latest Version
2.3.0
Compressed Size
820.6 KB
Modified
April 28, 2025

Introduction

Advances in AI video understanding and interaction have the potential to revolutionize how we access, analyze, and interact with video content in various domains. These AI models are capable of:

  • Video captioning-Generating text descriptions or summary of videos.
  • Question answering-Answering questions about a video's content.
  • Video retrieval-Finding specific videos (highlights) based on text queries.
  • Action recognition-Identifying actions happening in the video.

The current release of Video Search and Summarization Agent (VSS) demonstrates Video Summarization, Q&A and alerts with accelerated performance on NVIDIA hardware.

Features

VSS supports video upload, live stream support, summarizing on video files, image files and live streams with various configuration options. Features:

  • Faster/Quick Long video processing
  • Image / Multi-Image support
  • Live Stream (RTSP) support
  • Supported file formats: mp4, mkv, jpg, png
  • Supported codecs: h264/h265 video and Opus/Vorbis audio
  • Summarization for videos, images, and live streams
  • Q&A for files, images and live-streams
  • Event & Alerts
  • TRT-LMM acceleration for VILA-1.5 and NVILA
  • Multi Node multiple GPU support
  • Context aware RAG support for enhanced accuracy & Q&A
    • Graph RAG
    • Vector RAG
  • Support for GPT-4o as the VLM and LLM
  • Use OpenAI Compatible hosted VLM models
  • Drop-in support for custom VLMs
  • Guardrails support
  • OpenAI Compatible REST API
  • Multi-stream support
  • Use of Riva ASR based audio transcription in summarization, QnA, and alerts
  • CV pipeline to generate CV metadata and Set of Marks (SOM) Prompting for videos and live streams
  • Support for finetuned NVILA : Recipe to fuse LoRA checkpoint with Base NVILA model

Architecture

VSS Architecture

User Guide

User Guide is available at: https://docs.nvidia.com/vss/index.html

NOTE: This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

Deployment Note

The Video Search and Summarization Blueprint is shared as reference and is provided "as is". The security in the production environment is the responsibility of the end users deploying it. When deploying in a production environment, please have security experts review any potential risks and threats; define the trust boundaries, implement logging and monitoring capabilities, secure the communication channels, integrate AuthN & AuthZ with appropriate access controls, keep the deployment up to date, ensure the containers/source code are secure and free of known vulnerabilities. The end users are also responsible for ensuring integrity and authenticity of the models and containers.

Known CVEs

VSS Engine 2.3.0 Container has the following known CVEs:

CVE Description
CVE-2024-8966 This impacts gradio <= 5.22.0 python package, This impacts the file upload functionality of Gradio UI where an attacker can cause Denial-of-Service (DoS) attack by appending a large number of characters to the end of a multipart boundary. This affects the Gradio UI of VSS.
CVE-2025-32434 This impacts the torch v2.51.0 python package. This impacts loading of saved model weights from a tar file using torch.load() API which can result in remote code execution in case of malicious weights. The default weights for the models used by VSS are in safetensors format and are not affected by this vulnerability since torch.load() is not used. However, users must ensure safety of the weights if using other formats.

VSS Engine 2.3.0 Source Code has the following known CVEs:

CVE Description
CVE-2024-7246 This affects the gRPC python package. It's possible for a gRPC client communicating with a HTTP/2 proxy to poison the HPACK table between the proxy and the backend such that other clients see failed requests. By default, VSS does not use a HTTP/2 proxy.
CVE-2024-27444 This issue is reported for langchain-milvus 0.1.5 dependency on older langchain version 0.1.5. However, VSS explicitly uses langchain 0.3.3 and hence is not applicable.
CVE-2024-28088 This issue is reported for langchain-milvus 0.1.5 dependency on older langchain version 0.1.5. However, VSS explicitly uses langchain 0.3.3 and hence is not applicable.
CVE-2024-38459 This issue is reported for langchain-milvus 0.1.5 dependency on older langchain version 0.1.5. However, VSS explicitly uses langchain 0.3.3 and hence is not applicable.

VSS 2.2.0 (Previous Release) has the following known CVEs:

CVE Description
CVE-2024-11393 This impacts the transformers v4.47.0 python package. This impacts the Hugging Face Transformers MaskFormer Model Deserialization and allows remote attackers to execute arbitrary code. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. However, this does not affect VSS since MaskFormer model is not used in VSS.
CVE-2024-11392 This impacts the transformers v4.47.0 python package. This impacts the Hugging Face Transformers MobileViTV2 Model Deserialization and allows remote attackers to execute arbitrary code. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. However, this does not affect VSS since MobileViTV2 model is not used in VSS.
CVE-2024-11394 This impacts the transformers v4.47.0 python package. This impacts the Hugging Face Transformers Trax Model Deserialization and allows remote attackers to execute arbitrary code. User interaction is required to exploit this vulnerability in that the target must visit a malicious page or open a malicious file. However, this does not affect VSS since Trax model is not used in VSS.

GOVERNING TERMS

The software and materials are governed by the NVIDIA Software License Agreement and the Product-Specific Terms for NVIDIA AI Products, except for models which are governed by the NVIDIA Community Model License.

Additional information: Llama 3.1 Community License Agreement for Llama-3.1-70b-instruct; Llama 3.2 Community License Agreement for NVIDIA Retrieval QA Llama 3.2 1B Embedding v2 and NVIDIA Retrieval QA Llama 3.2 1B Reranking v2; Apache License, Version 2.0 for https://github.com/google-research/big_vision/blob/main/LICENSE and Apache License, Version 2.0 for https://github.com/01-ai/Yi/blob/main/LICENSE. Built with Llama.