NGC | Catalog
CatalogModels
Models
The NGC catalog offers 100s of pre-trained models for computer vision, speech, recommendation, and more. Bring AI faster to market by using these models as-is or quickly build proprietary models with a fraction of your custom data.
Sort: Last Modified
Logo for RIVA Citrinet ASR English
English Citrinet ASR model trained on ASR set 3.0, no-weight-decay
Logo for ESS DNN Stereo Disparity
ESS is a DNN that estimates disparity for a stereo image pair and returns a continuous disparity map for the given left image.
Logo for Bi3D Proximity Segmentation
Bi3D is a binary depth classification network that is used to classify the depth of objects at a given distance.
Logo for RIVA Conformer-XL ASR English - ASR set 4.0
English Conformer-XL ASR model trained on ASR set 4.0
End-to-end parallel speech synthesis model
Logo for wav2vec 2.0 Base fine-tuned
wav2vec 2.0 base PyTorch checkpoint pre-trained on LibriSpeech 960 h, fine-tuned on LibriSpeech 960 h
Logo for wav2vec 2.0 Base pre-trained
wav2vec 2.0 Base PyTorch checkpoint pre-trained on LibriSpeech 960 h
Logo for MoFlow PyTorch checkpoint
MoFlow PyTorch checkpoint
Logo for QuartzNet15x5: WSJ, LibriSpeech & MCV
QuartzNet15x5 model trained on WSJ, LibriSpeech and Mozilla's Common Voice En with NeMo
This collection contains two models. 1) Multi-speaker 44100Hz FastPitch trained on approximately 20 hours of Latin American Spanish speech from 174 speakers. 2) HiFiGAN trained on mel spectrograms produced by the Multi-speaker FastPitch in (1).
Logo for Re-Identification
Re-Identification network to generate embeddings for identifying persons in different scenes.
Logo for RIVA Conformer ASR Spanish
Spanish Conformer ASR model trained on ASR set 2.0.
Logo for Punctuation and Capitalization Bert
For each word in the input text, the model: 1) predicts a punctuation mark that should follow the word (if any), the model supports commas, periods and question marks) and 2) predicts if the word should be capitalized or not.
Logo for Speech to Text English Jasper
Speech to Text Jasper model for English.
Logo for RIVA Quartznet ASR English
English Quartznet ASR model trained on ASR set 1.2
Logo for RIVA Jasper ASR English
English ASR model trained on ASR Set 1.2, Noise Robust
Logo for Riva ASR Mandarin Inverse Normalization Grammar
Logo for RIVA EnglishUS Hifigan
HifiGAN is a neural vocoder model for text-to-speech applications. It is intended as the second part of a two-stage speech synthesis pipeline, with a mel-spectrogram generator such as FastPitch as the first stage.
Logo for RIVA EnglishUS Hifigan
Riva multisepaker with IPA for G2P
Logo for Speech Synthesis English FastPitch
Mel-Spectrogram prediction conditioned on input text with LJSpeech voice.
Logo for RIVA EnglishUS Fastpitch
FastPitch is a mel-spectrogram generator, designed to be used as the first part of a neural text-to-speech system in conjunction with a neural vocoder
Logo for RIVA EnglishUS Fastpitch
Riva multisepaker with IPA for G2P
Logo for Riva TTS English US Auxiliary Files
Contains files used in rmir creation
Logo for Speech Synthesis HiFi-GAN
GAN-based waveform generator from mel-spectrograms.
Logo for MONAI Spleen CT Segmentation
A pre-trained model for volumetric (3D) segmentation of the spleen from CT image.