TalkNet is an non-autoregressive model that generates mel spectrograms from text.
For more information about the model architecture, see the TalkNet paper [1,2].
This model is trained on LJSpeech sampled at 22050Hz, and has been tested on generating female English voices with an American accent.
No performance information available at this time.
This model can be automatically loaded from NGC.
from nemo.collections.tts.models import TalkNetSpectModel, TalkNetPitchModel, TalkNetDursModel
pretrained_model = "tts_en_talknet"
model = TalkNetSpectModel.from_pretrained(pretrained_model)
model.add_module('_pitch_model', TalkNetPitchModel.from_pretrained(pretrained_model))
model.add_module('_durs_model', TalkNetDursModel.from_pretrained(pretrained_model))
This model accepts batches of text.
This model generates mel spectrograms.
This checkpoint only works well with vocoders that were trained on 22050Hz data. Otherwise, the generated audio may be scratchy or choppy-sounding.
1.0.0rc1 (current): The original version released with NeMo 1.0.0rc1.
[1] TalkNet Paper [2] TalkNet 2 Paper
License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.