Many AI applications have common needs: classification, object detection, language translation, text-to-speech, recommender engines, sentiment analysis, and more. When developing applications with these capabilities, it is much faster to start with a model that is pre-trained and then tune it for a specific use case. The NGC catalog offers pre-trained models for a variety of common AI tasks that are optimized for NVIDIA Tensor Core GPUs, and can be easily re-trained by updating just a few layers, saving valuable time.
Sort: Last Modified
Bi3D Proximity Segmentation
Model
Bi3D is a binary depth classification network that is used to classify the depth of objects at a given distance.
ESS DNN Stereo Disparity
Model
ESS is a DNN that estimates disparity for a stereo image pair and returns a continuous disparity map for the given left image.
TTS En Multispeaker FastPitch HiFiGAN
Model
This collection contains two models:
1) Multi-speaker FastPitch (around 50M parameters) trained on HiFiTTS with over 291.6 hours of english speech and 10 speakers.
2) HiFiGAN trained on mel spectrograms produced by the Multi-speaker FastPitch in (1).
STT En Conformer-CTC Large
Model
Conformer-CTC-Large model for English Automatic Speech Recognition, Trained on NeMo ASRSET
STT En Conformer-Transducer Large
Model
Conformer-Transducer-Large model for English Automatic Speech Recognition, Trained on NeMo ASRSET
NER En Bert
Model
Named Entity Recognition model with BERT
PeopleSemSegnet
Model
Semantic segmentation of persons in an image.
Riva ASR English Inverse Normalization Grammar
Model
Base English grammar
DashCamNet
Model
4 class object detection network to detect cars in an image.
RIVA Conformer ASR Hindi
Model
Hindi Conformer ASR model trained on ASR set 1.0
Riva ASR Russian LM
Model
Base Russian 4-gram LM
English Tagger-based Inverse Text Normalization
Model
English single-pass tagger-based model for inverse text normalization based on bert-base-uncased, trained on 2 mln sentences from Google Text Normalization Dataset, achieves 3.75% WER on Google default test set
Efficient Geometry-aware 3D Generative Adversarial Networks
Model
Pretrained EG3D Models for FFHQ, AFHQ, and Shapenet Cars
TAO Pretrained Object Detection
Model
Pretrained weights to facilitate transfer learning using TAO Toolkit.
SSL En Conformer Large
Model
Self-Supervised Learning (SSL) checkpoints for Conformer Large model. These are similar to w2v-Conformer model and can be fine-tuned for Automatic Speech Recognition (ASR).
STT En Conformer-CTC XLarge
Model
Conformer-CTC-XLarge model for English Automatic Speech Recognition, Trained on NeMo ASRSET
STT En Conformer-Transducer XLarge
Model
Conformer-Transducer-XLarge model for English Automatic Speech Recognition, trained on NeMo ASRSET
SSL En Conformer XLarge
Model
Self-Supervised Learning (SSL) checkpoints for Conformer XLarge model. These are similar to w2v-Conformer model and can be fine-tuned for Automatic Speech Recognition (ASR).
PeopleNet
Model
3 class object detection network to detect people in an image.
Riva ASR Mandarin LM
Model
Base Mandarin 4-gram LM
TrafficCamNet
Model
4 class object detection network to detect cars in an image.
PeopleSegNet
Model
1 class instance segmentation network to detect and segment instances of people in an image.
LPDNet
Model
Object Detection network to detect license plates in an image of a car.