Text to Speech Notebook

End to End workflow for text to speech training with TAO Toolkit and deployment using Riva.
April 4, 2023
168.4 KB

Text to Speech

TTS, Text-To-Speech or Speech Synthesis refers to the problem of getting a program to generate human voice output output from text. TAO Toolkit supports a two-stage pipeline for TTS:

  1. A spectrogram model to generate a Mel spectrogram from text (FastPitch)
  2. A vocoder model to generate audio from a Mel spectrogram (HiFiGAN)

Our goal here is to generate FastPitch and HiFiGAN model, that when cascaded generates a good quality human voice from text.

The best place to get started with TAO Toolkit - TTS would be the TAO - TTS jupyter notebooks sample enclosed in this sample. This resource has three notebooks included.

  1. Training: Sample workflow for training FastPitch spectrogram generator model and a HiFiGAN vocoder model and export them to a .riva file
  2. Finetuning: Sample workflow to finetune a FastPitch spectrogram generator model and a HiFiGAN vocoder model from another pretrained FastPitch and HiFiGAN model.
  3. Deployment: Sample workflow to consume the .riva files and deploy it to Riva.

If you are a seasoned Conversation AI developer we recommend installing TAO and referring to the TAO documentation for detailed information.


Please make sure to install the following before proceeding further:

  • python 3.6.9
  • python-dev
  • docker-ce > 19.03.5
  • docker-API 1.40
  • nvidia-container-toolkit > 1.3.0-1
  • nvidia-container-runtime > 3.4.0-1
  • nvidia-docker2 > 2.5.0-1
  • nvidia-driver >= 455.23

Note: A compatible NVIDIA GPU would be required.


We recommend that you install TAO Toolkit inside a virtual environment. The steps to do the same are as follows

virtualenv -p python3 <name of venv>
source <name of venv>/bin/activate
pip install jupyter notebook # If you need to run the notebooks

TAO Toolkit is a python package that is hosted in nvidia python package index. You may install by using python’s package manager, pip.

pip install nvidia-pyindex
pip install nvidia-tao

To download the jupyter notebook please:

  1. Download the samples using the ngc cli with the following command

    ngc registry resource download-version "nvidia/tao/texttospeech_notebook:v1.1"
  2. Instantiate the jupyter notebook server

    jupyter notebook --ip --allow-root --port 8888


By downloading and using the models and resources packaged with TAO Toolkit Conversational AI, you would be accepting the terms of the Riva license