GPU-optimized AI, Machine Learning, & HPC Software

NVIDIA

Riva Speech Skills

Riva Speech Skills is a scalable Conversational AI service platform.

Container

NVIDIA AI Enterprise

NVIDIA

Riva TTS NIM

RIVA TTS NIM provide easy access to state-of-the-art text to speech models, capable of synthesizing English speech from text

Container

NVIDIA

Riva Skills Quick Start

Scripts and utilities for getting started with Riva Speech Skills

Resource

NVIDIA Developer Program

NVIDIA

TTS FastPitch HifiGAN Riva

RIVA TTS NIM provide easy access to state-of-the-art text to speech models, capable of synthesizing English speech from text

Container

NVIDIA

Riva TTS English US Auxiliary Files

Contains files used in rmir creation

Model

NVIDIA Developer Program

Nvidia

Chatterbox TTS Multilingual

Chatterbox TTS Multilingual NIM Container

Container

NVIDIA

WaveGlow LJS 256 Channels

WaveGlow model weights pre-trained on the LJ Speech dataset to be used with https://github.com/NVIDIA/waveglow.

Model

NVIDIA Deep Learning Examples

HiFi-GAN PyT checkpoint (22kHz, AMP)

HiFi-GAN v1 PyTorch checkpoint trained on 8GPU with AMP on LJSpeech-1.1 (22kHz).

Model

NVIDIA

WaveGlow

WaveGlow is a flow-based network capable of generating high quality speech from mel-spectrograms.

Model

NVIDIA

Text to Speech Notebook

End to End workflow for text to speech training with TAO Toolkit and deployment using Riva.

Resource

NVIDIA Deep Learning Examples

HiFi-GAN PyT checkpoint (FastPitch ftune, 22kHz, AMP)

HiFi-GAN v1 PyTorch checkpoint trained on 8GPU with AMP on LJSpeech-1.1 (22kHz), fine-tuned on FastPitch outputs.

Model

NVIDIA

Tacotron2 LJSpeech

Model checkpoints for the Tacotron 2 model trained with NeMo.

Model

NVIDIA

Speech Synthesis English FastPitch

Mel-Spectrogram prediction conditioned on input text with LJSpeech voice.

Model

NVIDIA

Speech Synthesis HiFi-GAN

GAN-based waveform generator from mel-spectrograms.

Model

NVIDIA

Audio Codec 16kHz Small

This model card contains a Small Audio Codec model trained on the Libri-Light audiobook recordings dataset, comprising approximately 60,000 hours of English language speech with a 16kHz sampling rate.

Model

NVIDIA Deep Learning Examples

HiFi-GAN for PyTorch

HiFi-GAN model implements a spectrogram inversion model that allows to synthesize speech waveforms from mel-spectrograms.

Resource

NVIDIA

WaveGlow LJSpeech

Model checkpoints for the WaveGlow model trained with NeMo.

Model

NVIDIA

RIVA Magpie-TTS Multilingual

Riva NeMo-MagpieTTS Multilingual IPA multispeaker model with Emotions

Model

NVIDIA Deep Learning Examples

Tacotron2 PyTorch checkpoint (AMP)

Tacotron2 PyTorch checkpoint trained with AMP

Model

NVIDIA

Flowtron

Flowtron is an Autoregressive Flow-based Network for Text-to-Mel-spectrogram Synthesis.

Model

NVIDIA Deep Learning Examples

Tacotron2 and Waveglow 2.0 for PyTorch

The Tacotron 2 and WaveGlow model form a text-to-speech system that enables user to synthesise a natural sounding speech from raw transcripts.

Resource

NVIDIA Deep Learning Examples

FastPitch checkpoint (PyTorch, AMP, LJSpeech-1.1, 22050Hz)

FastPitch PyTorch checkpoint trained on LJSpeech-1.1

Model

NVIDIA

Speech Synthesis Waveglow

Universal waveform generator from mel-spectrograms.

Model

—

Mel Codec 22kHz fullband medium

22.05kHz full-band Mel Codec model trained on multi-lingual speech.

Model