NGC | Catalog
CatalogModelsASR TalkNet Aligner

ASR TalkNet Aligner

For downloads and more information, please view on a desktop device.
Logo for ASR TalkNet Aligner

Description

Text/Audio aligner based on QuartzNet 5x5 for durations extraction

Publisher

-

Latest Version

1.0.0rc1

Modified

April 4, 2023

Size

28.82 MB

Model Overview

ASR-based text/audio aligner based on CTC-loss algorithm that was used to train TalkNet.

Usage

Automatically load the model from NGC

from nemo.collections.asr.models import EncDecCTCModel
model = EncDecCTCModel.from_pretrained("asr_talknet_aligner")

For an example, on how to use this model to generate speech, refer to the TTS inference notebook.

Training Set

This model is trained on LibriTTS sampled at 22050Hz with input text converted to phonemes, and can be used to extract durations for audio excerpt and corresponding phonemes sequence.

References

[2] TalkNet 2 Paper