NGC | Catalog
Welcome Guest
CatalogModelsTTS En TalkNet

TTS En TalkNet

For downloads and more information, please view on a desktop device.
Logo for TTS En TalkNet

Description

Speech Synthesis model trained on female English speech

Publisher

NVIDIA

Use Case

Other

Framework

PyTorch with NeMo

Latest Version

1.0.0rc1

Modified

November 11, 2021

Size

48.75 MB

Model Overview

TalkNet is an non-autoregressive model that generates mel spectrograms from text.

Model Architecture

For more information about the model architecture, see the TalkNet paper [1,2].

Training

This model is trained on LJSpeech sampled at 22050Hz, and has been tested on generating female English voices with an American accent.

Performance

No performance information available at this time.

How to Use this Model

This model can be automatically loaded from NGC.

from nemo.collections.tts.models import TalkNetSpectModel, TalkNetPitchModel, TalkNetDursModel
pretrained_model = "tts_en_talknet"
model = TalkNetSpectModel.from_pretrained(pretrained_model)
model.add_module('_pitch_model', TalkNetPitchModel.from_pretrained(pretrained_model))
model.add_module('_durs_model', TalkNetDursModel.from_pretrained(pretrained_model))

Input

This model accepts batches of text.

Output

This model generates mel spectrograms.

Limitations

This checkpoint only works well with vocoders that were trained on 22050Hz data. Otherwise, the generated audio may be scratchy or choppy-sounding.

Versions

1.0.0rc1 (current): The original version released with NeMo 1.0.0rc1.

References

[1] TalkNet Paper [2] TalkNet 2 Paper

Licence

License to use this model is covered by the NGC TERMS OF USE unless another License/Terms Of Use/EULA is clearly specified. By downloading the public and release version of the model, you accept the terms and conditions of the NGC TERMS OF USE.