LangID Ambernet | NVIDIA NGC

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

AmberNet Lang ID model for Spoken Language Identification

Publisher

NVIDIA

Latest Version

1.12.0

Modified

April 4, 2023

Size

110.67 MB

Model Overview

This model can be used for Spoken Language Identification (LangID / LID) and serves as the first step for Automatic Speech Recognition (ASR).

Model Architecture

The model is based on AmberNet architecture presented in AmberNet paper. (The paper will be published soon)

Training

The model was trained on a publicly available dataset. The NeMo toolkit was used for training this model for 40 epochs on multiple GPUs.

Datasets

While training this model, we used the following datasets:

Voxlingua107. It contains YouTube data for 107 languages in the official training set. The total amount of speech in the training set is 6628 hours. The average amount of data per language is 62 hours. We stratified shuffle split 10% of training data as the validation set.

Performance

Achieve 5.22% error rate on official evaluation set which contains 1609 verified utterances of 33 languages.

How to Use this Model

The model is available for use in the NeMo toolkit and can be used as a pre-trained checkpoint for inference or for fine-tuning on another dataset.

Automatically load the model from NGC

import nemo
import nemo.collections.asr as nemo_asr
vad_model = nemo_asr.models.EncDecSpeakerLabelModel.from_pretrained(model_name="langid_ambernet")

Input

This model accepts 16000 KHz Mono-channel Audio (wav files) as input.

Output

This model provides spoken language identification of the given utterance.

Limitations

Since this model was trained on publically available datasets, the performance of this model might degrade for custom data that the model has not been trained on. The model is trained with 107 languages, and you need to finetune it for unseen languages.

References

[1] Valk, Jörgen, and Tanel Alumäe. "VoxLingua107: a dataset for spoken language recognition." 2021 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2021.

License

License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.