NGC | Catalog
CatalogCollectionsTAO/Riva - Speech Synthesis

TAO/Riva - Speech Synthesis

For contents of this collection and more information, please view on a desktop device.
Logo for TAO/Riva - Speech Synthesis

Description

This collection contains end-to-end neural models for Text to Speech (TTS) training and deployment with TAO Toolkit and RIVA respectively.

Curator

NVIDIA

Modified

April 7, 2022
Containers
Helm Charts
Models
Resources

Speech Synthesis Collection

Overview

This collection contains end-to-end neural models for Text to Speech (TTS) that can be trained using TAO Toolkit and deployed with Riva. The models in this collection can be used for synthesizing speech from text. The current TTS pipeline requires two models:

  1. Spectrogram generator: a model that generates a mel spectrogram from text input
  2. Vocoder: A model that generates audio from a mel spectrogram

TAO Toolkit supports training models of the following architectures from scratch

  1. FastPitch (spectrogram generator)
  2. HiFiGAN (vocoder)

For more information on how to train the end to end TTS models using TAO and deploy to RIVA, refer the TAO Toolkit Text-To-Spech documentation.

Available Models

Deployable RIVA models for FastPitch and HiFiGAN are available with this collection:

License

By downloading and using the models and resources packaged with TAO Conversational AI, you would be accepting the terms of the Riva license

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.