TTS, Text-To-Speech or Speech Synthesis refers to the problem of getting a program to generate human voice output output from text. TAO Toolkit supports a two-stage pipeline for TTS:
Our goal here is to generate FastPitch and HiFiGAN model, that when cascaded generates a good quality human voice from text.
The best place to get started with TAO Toolkit - TTS would be the TAO - TTS jupyter notebooks sample enclosed in this sample. This resource has three notebooks included.
.riva
file.riva
files and deploy it to Riva.If you are a seasoned Conversation AI developer we recommend installing TAO and referring to the TAO documentation for detailed information.
Please make sure to install the following before proceeding further:
Note: A compatible NVIDIA GPU would be required.
We recommend that you install TAO Toolkit inside a virtual environment. The steps to do the same are as follows
virtualenv -p python3 <name of venv>
source <name of venv>/bin/activate
pip install jupyter notebook # If you need to run the notebooks
TAO Toolkit is a python package that is hosted in nvidia python package index. You may install by using python’s package manager, pip.
pip install nvidia-pyindex
pip install nvidia-tao
To download the jupyter notebook please:
Download the samples using the ngc cli with the following command
ngc registry resource download-version "nvidia/tao/texttospeech_notebook:v1.1"
Instantiate the jupyter notebook server
jupyter notebook --ip 0.0.0.0 --allow-root --port 8888
By downloading and using the models and resources packaged with TAO Toolkit Conversational AI, you would be accepting the terms of the Riva license