GPU-optimized AI, Machine Learning, & HPC Software

NVIDIA

Diarization MSDD Telephonic

Multi-scale Diarization Decoder (MSDD) model for speaker diarization of telephone conversations

Model

NVIDIA

TTS Vocoder Hifigan

HiFiGAN Speech Synthesis model

Model

NVIDIA

STT En Conformer-CTC Large

Conformer-CTC-Large model for English Automatic Speech Recognition, Trained on NeMo ASRSET

Model

NVIDIA

TTS En FastPitch

FastPitch Speech Synthesis model trained on female English speech.

Model

NVIDIA

TTS Es Multispeaker FastPitch HiFiGAN

This collection contains two models. 1) Multi-speaker 44100Hz FastPitch trained on approximately 20 hours of Latin American Spanish speech from 174 speakers. 2) HiFiGAN trained on mel spectrograms produced by the Multi-speaker FastPitch in (1).

Model

NVIDIA

VAD Marblenet

MarbleNet VAD model

Model

NVIDIA

NMT En Es Transformer12x2

Neural Machine Translation (NMT) model to translate from English to Spanish

Model

NVIDIA

Bertlargeuncased

BERT Large Uncased trained on English Wikipedia and BookCorpus

Model

NVIDIA

NMT En Zh Transformer24x6

Neural Machine Translation (NMT) model to translate from English to Simplified Chinese

Model

NVIDIA

STT En Citrinet 1024

Citrinet 1024model trained on ASR Set dataset

Model

NVIDIA

TTS En Tacotron2

Tacotron2 Speech Synthesis model trained on female English speech

Model

NVIDIA

STT Zh Quartznet15x5

QuartzNet is a Jasper-like network that uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters. This particular model has 15 blocks each repeated 5 times.

Model

NVIDIA

SpeakerVerification Speakernet

SpeakertNet-M model trained with NeMo for speaker verification and speaker embeddings

Model

NVIDIA

STT Es Quartznet15x5

Speech To Text (STT) model based on QuartzNet for recognizing Spanish speech.

Model

NVIDIA

TTS En E2E FastPitch Hifigan

FastPitch+HiFiGAN End-to-End Speech Synthesis model trained on female English speech