TalkNet is an non-autoregressive model that generates mel spectrograms from text.
For more information about the model architecture, see the TalkNet paper [1,2].
This model is trained on LJSpeech sampled at 22050Hz, and has been tested on generating female English voices with an American accent.
No performance information available at this time.
This model can be automatically loaded from NGC.
from nemo.collections.tts.models import TalkNetSpectModel, TalkNetPitchModel, TalkNetDursModel pretrained_model = "tts_en_talknet" model = TalkNetSpectModel.from_pretrained(pretrained_model) model.add_module('_pitch_model', TalkNetPitchModel.from_pretrained(pretrained_model)) model.add_module('_durs_model', TalkNetDursModel.from_pretrained(pretrained_model))
This model accepts batches of text.
This model generates mel spectrograms.
This checkpoint only works well with vocoders that were trained on 22050Hz data. Otherwise, the generated audio may be scratchy or choppy-sounding.
1.0.0rc1 (current): The original version released with NeMo 1.0.0rc1.
 TalkNet Paper  TalkNet 2 Paper