NVIDIA

Active Speaker Detection

Container

NVIDIA

Active Speaker Detection

NVIDIA Active Speaker Detection NIM supports the detection and identification of multiple speakers in a video stream

NVIDIA Developer Program NVIDIA AI Enterprise

NVIDIA AI Enterprise Supported NVIDIA NIM

Join or Subscribe to get accessSubscribe to the product below to access this premium content:

NVIDIA Developer ProgramJoin the Developer Program for access to free tools, support, and tech resources.

Get Access

NVIDIA AI EnterpriseAccelerate your AI agent development

Subscribe Now

Note: You can gain access to hundreds more GPU-optimized artifacts by creating a free NGC account.

Already Subscribed?Log in

What Is NVIDIA NIM?

NVIDIA NIM™, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high performance AI model inferencing across clouds, data centers and workstations. Supporting a wide range of AI models, including open-source and NVIDIA AI Foundation and custom models, it ensures seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry standard APIs.

NVIDIA Active Speaker Detection (ASD) supports the detection and identification of multiple speakers in a video stream. Supporting resolutions from 720p to UHD, and audio sample rates of 16kHz, 44.1kHz, and 48 kHz, ASD identifies unique speakers, and then ID’s their spoken audio to support diarization pipelines. It is able to support multiple speakers and performs across complex edits and conflicting dialogue or music.

NVIDIA NIM offers prebuilt containers for computer vision models. Each NIM consists of a container and a model and uses a CUDA-accelerated runtime for all NVIDIA GPUs, with special optimizations available for many configurations. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated inference at scale.

Getting Started with NVIDIA NIM

Deploying and integrating NVIDIA NIM is straightforward thanks to our industry standard APIs. Visit the NVIDIA Active Speaker Detection NIM page for release documentation, deployment guides and more.

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

Governing Terms

Use of the NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of the models is governed by the NVIDIA Open Model License. Additional Information:MIT

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.

Publisher

NVIDIA

LicenseNVIDIA proprietary

See Governing Terms

Latest Tag1.1.0

UpdatedJuly 14, 2026 UTC

Compressed Size13.7 GB

Multinode SupportNo

Multi-Arch SupportNo

System

signed images

Labels

NSPECT-YDNR-FR7O