GPU-optimized AI, Machine Learning, & HPC Software

NVIDIA

The Domain Specific - NeMo Automatic Speech Recognition (ASR) Application facilitates training, evaluation and performance comparison of ASR models. This NeMo application enables you to train or fine-tune pre-trained ASR models with your own data.

Container

MeetKai Inc.

MK-SQuIT

SQuIT (Synthesizing Questions using Iterative Template-Filling) is a generated dataset produced with little human intervention. This container provides several tutorial applications - an interactive dataset explorer, a walkthrough of the generation pipeline, and a demonstration using NeMo to fine tune and evaluate a model on the dataset.

Container

NVIDIA

Audio Codec 16kHz Small

This model card contains a Small Audio Codec model trained on the Libri-Light audiobook recordings dataset, comprising approximately 60,000 hours of English language speech with a 16kHz sampling rate.

Model

NVIDIA

TitaNet-S

TitaNet Small model for Speaker Verification and Diarization tasks

Model

NVIDIA

LangID PearlNet

PearlNet Lang ID model for Spoken Language Identification

Model

NVIDIA

STT En Fast Conformer-CTC Large

Fast Conformer-CTC-Large model for English Automatic Speech Recognition, Trained on NeMo ASRSET

Model

NVIDIA

Parakeet-TDT_CTC-110M

Large size version of hybrid Fast Conformer TDT-CTC 114M parameter model trained on larger dataset of 36000 hrs with Punctuation and Capitalization. This model is jointly developed by NVIDIA NeMo and Suno.ai teams.

Model

NVIDIA

STT En FastConformer Hybrid Transducer-CTC Large P&C

This collection contains the large version (114M) of the English speech recognition model with a FastConformer encoder and a Hybrid decoder (joint RNNT-CTC loss). The model has a vocab size of 1024 and emits text with punctuation and capitalization.

Model

—

STT En Zh Multilingual Code-Switched FastConformer Transducer L

English + Mandarin Multilingual and Code-Switched Speech Recognition FastConformer Transducer Large Model

Model

NVIDIA

STT En Fast Conformer-Transducer Large

Fast Conformer-Transducer-Large model for English Automatic Speech Recognition, Trained on NeMo ASRSET

Model

NVIDIA

STT Fa FastConformer Hybrid Transducer-CTC Large

This collection contains the large version (114M) of the Persian speech recognition model with a FastConformer encoder and a Hybrid decoder (joint RNNT-CTC loss). The model has a vocab size of 1024.

Model

NVIDIA

TTS En FastPitch SpectrogramEnhancer For-ASR-Finetuning

This collection contains FastPitch and Spectrogram Enhancer models. Main use case is English ASR domain fine-tuning. Direct TTS use is not advised.

Model

NVIDIA

STT En Fast Conformer-Transducer XXLarge

Fast Conformer-Transducer-XXLarge model for English Automatic Speech Recognition, trained on NeMo ASRSET 3.0

Model

NVIDIA

STT En Fast Conformer-CTC XLarge

Fast Conformer-CTC-XLarge model for English Automatic Speech Recognition, Trained on NeMo ASRSET

Model

NVIDIA

STT En Fast Conformer-Transducer XLarge

Fast Conformer-Transducer-Large model for English Automatic Speech Recognition, Trained on NeMo ASRSET 3.0

Model

NVIDIA

STT En Fast Conformer-Transducer Large LibriSpeech

Fast Conformer-Transducer-Large model for English Automatic Speech Recognition, Trained with NeMo on LibriSpeech dataset

Model

NVIDIA

STT En Fast Conformer-CTC XXLarge

Fast Conformer-CTC-XXLarge model for English Automatic Speech Recognition, Pre-trained on LibriLight and fine-tuned on NeMo ASRSET 3.0

Model

NVIDIA

ESM-2nv 8M

An 8 million parameter BERT model fully pre-trained with BioNeMo

Model