NGC | Catalog
CatalogModelsRIVA Diarizer Neural VAD

RIVA Diarizer Neural VAD

For downloads and more information, please view on a desktop device.
Logo for RIVA Diarizer Neural VAD

Description

Neural VAD model used in Riva Speaker Diarization

Publisher

NVIDIA

Use Case

Speaker Diarization

Framework

NeMo

Latest Version

deployable_v1.0

Modified

December 1, 2022

Size

340.78 KB

Speaker Diarization: MarbleNet Model Card

Model Overview

This model can be used for Voice Activity Detection (VAD) and served as first step for Speaker Diarization (SD).

Model Architecture

The model is based on MarbleNet architecture presented in MarbleNet paper [1]. Different from the paper, the input feature of this model is log-mel spectrogram with n_mels=80 so it can be easily and efficiently integrated with speaker diarization.

Training

The model was trained on mutiple publicly available datasets. The NeMo toolkit [2] was used for training this model for 50 epochs on multiple GPUs.

How to Use this Model

To use this model, we can use Riva Skills Quick start guide, it is a starting point to try out Riva models. Information regarding Quick start guide can be found : here. To use Riva Speech ASR service using this model, document has the necessary information.

Input

This model accepts 16000 KHz Mono-channel Audio (wav files) as input.

Output

This model provides frame-level voice activity prediction.

References

[1] Jia, Fei, Somshubra Majumdar, and Boris Ginsburg. MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021. [2] NVIDIA NeMo Toolkit

License

By downloading and using the models and resources packaged with Riva Conversational AI, you would be accepting the terms of the Riva license