NVIDIA Active Speaker Detection NIM supports the detection and identification of multiple speakers in a video stream


What Is NVIDIA NIM?
NVIDIA NIM™, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed for secure, reliable deployment of high performance AI model inferencing across clouds, data centers and workstations. Supporting a wide range of AI models, including open-source and NVIDIA AI Foundation and custom models, it ensures seamless, scalable AI inferencing, on-premises or in the cloud, leveraging industry standard APIs.
NVIDIA Active Speaker Detection (ASD) supports the detection and identification of multiple speakers in a video stream. Supporting resolutions from 720p to UHD, and audio sample rates of 16kHz, 44.1kHz, and 48 kHz, ASD identifies unique speakers, and then ID’s their spoken audio to support diarization pipelines. It is able to support multiple speakers and performs across complex edits and conflicting dialogue or music.
NVIDIA NIM offers prebuilt containers for computer vision models. Each NIM consists of a container and a model and uses a CUDA-accelerated runtime for all NVIDIA GPUs, with special optimizations available for many configurations. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated inference at scale.
Getting Started with NVIDIA NIM
Deploying and integrating NVIDIA NIM is straightforward thanks to our industry standard APIs. Visit the NVIDIA Active Speaker Detection NIM page for release documentation, deployment guides and more.
Get Help
Enterprise Support
Get access to knowledge base articles and support cases or submit a ticket.
Governing Terms
Use of the NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for NVIDIA AI Products. Use of the models is governed by the NVIDIA Open Model License. Additional Information:MIT
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.