NGC | Catalog
Welcome Guest
CatalogModelsFastPitch checkpoint (PyTorch, AMP, LJSpeech-1.1, 22050Hz)

FastPitch checkpoint (PyTorch, AMP, LJSpeech-1.1, 22050Hz)

For downloads and more information, please view on a desktop device.
Logo for FastPitch checkpoint (PyTorch, AMP, LJSpeech-1.1, 22050Hz)

Description

FastPitch PyTorch checkpoint trained on LJSpeech-1.1

Publisher

NVIDIA Deep Learning Examples

Use Case

Text To Speech

Framework

PyTorch

Latest Version

21.12.1_amp

Modified

August 17, 2022

Size

176.57 MB

Model Overview

The FastPitch model generates mel-spectrograms from raw input text and allows to exert additional control over the synthesized utterances.

Model Architecture

FastPitch is a fully feedforward Transformer model that predicts mel-spectrograms from raw text (Figure 1). The entire process is parallel, which means that all input letters are processed simultaneously to produce a full mel-spectrogram in a single forward pass.

FastPitch model architecture

Figure 1. Architecture of FastPitch (source). The model is composed of a bidirectional Transformer backbone (also known as a Transformer encoder), a pitch predictor, and a duration predictor. After passing through the first *N* Transformer blocks, encoding, the signal is augmented with pitch information and discretely upsampled. Then it goes through another set of *N* Transformer blocks, with the goal of smoothing out the upsampled signal, and constructing a mel-spectrogram.

Training

This model was trained using script available in GitHub repo.

Dataset

The following datasets were used to train this model:

  • LJSpeech-1.1 - Dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

Performance

Performance numbers for this model are available in GitHub readme performance section.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.