NGC | Catalog
CatalogModelsASR TalkNet Aligner

ASR TalkNet Aligner

Logo for ASR TalkNet Aligner
Description
Text/Audio aligner based on QuartzNet 5x5 for durations extraction
Publisher
-
Latest Version
1.0.0rc1
Modified
April 4, 2023
Size
28.82 MB

Model Overview

ASR-based text/audio aligner based on CTC-loss algorithm that was used to train TalkNet.

Usage

Automatically load the model from NGC

from nemo.collections.asr.models import EncDecCTCModel
model = EncDecCTCModel.from_pretrained("asr_talknet_aligner")

For an example, on how to use this model to generate speech, refer to the TTS inference notebook.

Training Set

This model is trained on LibriTTS sampled at 22050Hz with input text converted to phonemes, and can be used to extract durations for audio excerpt and corresponding phonemes sequence.

References

[2] TalkNet 2 Paper