NVIDIA
NVIDIA
STT En Conformer-Transducer XLarge
Model
NVIDIA
NVIDIA
STT En Conformer-Transducer XLarge

Conformer-Transducer-XLarge model for English Automatic Speech Recognition, trained on NeMo ASRSET

2 Versions
1.10.0Selected
05/26/2022 9:07 PM UTC2.4 GB
Accuracy
KeyValue
NSC Part 15.70 %
Librispeech test-other3.01 %
Librispeech dev-other2.95 %
NSC Part 6 long6.47 %
WSJ Eval 921.17 %
WSJ Dev 932.05 %
Peoples Speech21.32 %
Librispeech dev-clean1.48 %
Multilingual Librispeech dev (EN)4.59 %
Librispeech test-clean1.62 %
Mozilla Common Voice 8.0 test6.46 %
Multilingual Librispeech test (EN)5.32 %
Model
KeyValue
Encoder Dimension1024
Number of Encoder Layers24
DatasetNeMo ASRSET 3.0
ARCHITECTUREConformer-Transducer
Number of Predictor Layers2
INPUTS16000 KHZ MONO-CHANNEL AUDIO (WAV FILES)
Number of Weights0.6B
OUTPUTSTRANSCRIBED SPEECH
04/14/2022 11:50 PM UTC2.22 GB
Accuracy
KeyValue
NSC Part 16.30 %
Librispeech test-other3.18 %
Librispeech dev-other3.06 %
WSJ Eval 921.40 %
WSJ Dev 932.20 %
Librispeech dev-clean1.48 %
Multilingual Librispeech dev (EN)5.26 %
Librispeech test-clean1.70 %
Multilingual Librispeech test (EN)6.02 %
Model
KeyValue
Encoder Dimension1024
Number of Encoder Layers24
DatasetNeMo ASRSET 2.0
ARCHITECTUREConformer-Transducer
Number of Predictor Layers2
INPUTS16000 KHZ MONO-CHANNEL AUDIO (WAV FILES)
Number of Weights650M
OUTPUTSTRANSCRIBED SPEECH