GPU-optimized AI, Machine Learning, & HPC Software

NVIDIA

TTS Vocoder Hifigan

HiFiGAN Speech Synthesis model

Model

NVIDIA

Text to Speech Notebook

End to End workflow for text to speech training with TAO Toolkit and deployment using Riva.

Resource

NVIDIA

TTS En E2E FastPitch Hifigan

FastPitch+HiFiGAN End-to-End Speech Synthesis model trained on female English speech

Model

NVIDIA

Speech Synthesis HiFi-GAN

GAN-based waveform generator from mel-spectrograms.

Model

NVIDIA

TTS Zh Fastpitch HifiGan SFSpeech

This model card includes two Mandarin Chinese models: 1) FastPitch Mel-spectrogram generator trained on SF Chinese/English Bilingual Speech dataset; 2) HiFiGAN vocoder trained on Mel-spectrograms predicted by the FastPitch.

Model

NVIDIA

TTS DE Multi-Speaker FastPitch HiFiGAN

This collection includes two German models: FastPitch trained on the HUI-Audio-Corpus-German clean dataset where the 5-largest amount of speakers are selected and balanced; HiFiGAN is trained on mel-spectrograms predicted by the Multi-speaker FastPitch.

Model

NVIDIA

TTS En E2E Fastspeech2 Hifigan

FastSpeech2+HiFiGAN End-to-End Speech Synthesis model trained on female English speech

Model

NVIDIA

Speech Synthesis HiFi-GAN

GAN-based waveform generator from mel-spectrograms.

Model

NVIDIA

RIVA EnglishUS Hifigan

HifiGAN is a neural vocoder model for text-to-speech applications. It is intended as the second part of a two-stage speech synthesis pipeline, with a mel-spectrogram generator such as FastPitch as the first stage.

Model