NGC Catalog
CLASSIC
Welcome Guest
Models
Multilingual Silero VAD

Multilingual Silero VAD

For downloads and more information, please view on a desktop device.
Logo for Multilingual Silero VAD
Description
Multilingual Silero Voice Activity Detection model
Publisher
NVIDIA
Latest Version
v5
Modified
December 11, 2024
Size
2.22 MB

Model Overview

Description:

This model can be used for Voice Activity Detection (VAD), and serves as the first step for Automatic Speech Recognition (ASR). Silero VAD works with 8 kHz and 16 kHz sample rates, with fixed 256 and 512 sample windows respectively. It supports more than 6,000 languages.

This model is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Silero Voice Activity Detector | PyTorch.

License/Terms of Use:

This model is governed by the NVIDIA RIVA License Agreement.

Disclaimer: AI models generate responses and outputs based on complex algorithms and machine learning techniques, and those responses or outputs may be inaccurate or offensive. By downloading a model, you assume the risk of any harm caused by any response or output of the model.

By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Silero VAD’s privacy policy. Silero VAD is released under the MIT license.

References:

Silero VAD website Silero VAD citation

@misc{Silero VAD,
  author = {Silero Team},
  title = {Silero VAD: pre-trained enterprise-grade Voice Activity Detector (VAD), Number Detector and Language Classifier},
  year = {2024},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/snakers4/silero-vad}},
  commit = {insert_some_commit_here},
  email = {hello@silero.ai}
}

Model Architecture:

Architecture Type: Unknown Network Architecture: Silero VAD

Input:

Input Type(s): Audio Input Format(s): Linear PCM 16-bit 1 channel (Audio) Input Parameters: One-Dimensional (1D)

Output:

Output Type(s): Probabilities of speech Output Format: Float Output Parameters: 1D

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere
  • NVIDIA Blackwell

Supported Operating System(s):

  • Linux

Model Version(s): v5

Training Dataset:

  • Bible.is Data Collection Method: Unknown Labeling Method: Unknown

  • globalrecordings.net Data Collection Method : Unknown Labeling Method: Unknown

  • VoxLingua107 Data Collection Method : Unknown Labeling Method: Unknown

  • Common Voice Data Collection Method : Human Labeling Method: Human

  • MLS Data Collection Method : Human Labeling Method: Human

Inference: Engine: Onnxruntime, Triton

Test Hardware:

  • A100
  • H100

For more detail on model usage, evaluation, training dataset and implications, please refer to Silero VAD github.

## Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).