Many AI applications have common needs: classification, object detection, language translation, text-to-speech, recommender engines, sentiment analysis, and more. When developing applications with these capabilities, it is much faster to start with a model that is pre-trained and then tune it for a specific use case. The NGC catalog offers pre-trained models for a variety of common AI tasks that are optimized for NVIDIA Tensor Core GPUs, and can be easily re-trained by updating just a few layers, saving valuable time.
Sort: Last Modified
TTS DE Multi-Speaker FastPitch HiFiGAN
Model
This collection includes two German models: FastPitch trained on the HUI-Audio-Corpus-German clean dataset where the 5-largest amount of speakers are selected and balanced; HiFiGAN is trained on mel-spectrograms predicted by the Multi-speaker FastPitch.
English T5-based Inverse Text Normalization
Model
English inverse text normalization model based on albert-base-v2 tagger and t5-small decoder.
RIVA Conformer ASR English(en-US)
Model
English Conformer ASR model for en-US
RIVA Conformer ASR Mandarin
Model
Mandarin (zh-CN) Conformer ASR model trained on ASR set 2.0
RIVA Conformer ASR Russian - ASR set 1.0
Model
Rusian Conformer ASR model trained on ASR set 1.0
TTS En Multispeaker FastPitch HiFiGAN
Model
This collection contains two models:
1) Multi-speaker FastPitch (around 50M parameters) trained on HiFiTTS with over 291.6 hours of english speech and 10 speakers.
2) HiFiGAN trained on mel spectrograms produced by the Multi-speaker FastPitch in (1).
TTS En RAD-TTS Aligner
Model
RAD-TTS Aligner model trained on female English speech.
STT Hr Conformer-CTC Large
Model
Conformer-CTC-Large model for Croatian automatic speech recognition, trained on ParlaSpeech-HR v1.0 dataset.
STT Hr Conformer-Transducer Large
Model
Conformer-Transducer-Large model for Croatian automatic speech recognition, trained on ParlaSpeech-HR v1.0 dataset.
RIVA Conformer ASR French
Model
French (fr-FR) Conformer ASR model trained on ASR set 2.0
RIVA Conformer ASR English
Model
English (en-GB) Conformer ASR model trained on ASR set 1.0
STT En Es Multilingual Code-Switched Conformer Transducer Large
Model
English + Spanish Multilingual and Code-Switched Speech Recognition Conformer Transducer Large Model
STT Ca Conformer-Transducer Large
Model
Conformer-Transducer-Large model for Catalan automatic speech recognition, Trained on MCV-9.0 dataset.
STT Ca Conformer-CTC Large
Model
Conformer-CTC-Large model for Catalan automatic speech recognition, trained on MCV 9.0 dataset.
STT Rw Conformer-CTC Large
Model
Conformer-CTC-Large model for Kinyarwanda Automatic Speech Recognition, trained on Mozilla Common Voice 9.0 Kinyarwanda dataset.
STT Rw Conformer-Transducer Large
Model
Conformer-Transducer-Large model for Kinyarwanda Automatic Speech Recognition, trained on Mozilla Common Voice 9.0 Kinyarwanda dataset.
RIVA Punctuation and Capitalization for French
Model
For each word in the input text, the model: 1) predicts a punctuation mark that should follow the word (if any), the model supports commas, periods, hyphens and question marks) and 2) predicts if the word should be capitalized or not.
RIVA Punctuation and Capitalization for Hindi
Model
For each word in the input text, the model predicts a punctuation mark that should follow the word (if any), the model supports commas, poornvirams, exclaimation marks and question marks.
Russian Tagger-based Inverse Text Normalization
Model
Russian single-pass tagger-based model for inverse text normalization based on BERT encoder, trained on 2 mln sentences from Google Text Normalization Dataset, achieves 3.55% WER on Google default test set