Model
Conformer-Transducer-XLarge model for English Automatic Speech Recognition, trained on NeMo ASRSET
Use the NGC CLI to download:
Copied!
2 Versions
05/26/2022 9:07 PM UTC2.4 GB Copied!
05/26/2022 9:07 PM UTC2.4 GB
Copied!
Accuracy
| Key | Value |
|---|---|
| NSC Part 1 | 5.70 % |
| Librispeech test-other | 3.01 % |
| Librispeech dev-other | 2.95 % |
| NSC Part 6 long | 6.47 % |
| WSJ Eval 92 | 1.17 % |
| WSJ Dev 93 | 2.05 % |
| Peoples Speech | 21.32 % |
| Librispeech dev-clean | 1.48 % |
| Multilingual Librispeech dev (EN) | 4.59 % |
| Librispeech test-clean | 1.62 % |
| Mozilla Common Voice 8.0 test | 6.46 % |
| Multilingual Librispeech test (EN) | 5.32 % |
Model
| Key | Value |
|---|---|
| Encoder Dimension | 1024 |
| Number of Encoder Layers | 24 |
| Dataset | NeMo ASRSET 3.0 |
| ARCHITECTURE | Conformer-Transducer |
| Number of Predictor Layers | 2 |
| INPUTS | 16000 KHZ MONO-CHANNEL AUDIO (WAV FILES) |
| Number of Weights | 0.6B |
| OUTPUTS | TRANSCRIBED SPEECH |
1.8.0Selected04/14/2022 11:50 PM UTC2.22 GB Copied!
1.8.0Selected
04/14/2022 11:50 PM UTC2.22 GB
Copied!
Accuracy
| Key | Value |
|---|---|
| NSC Part 1 | 6.30 % |
| Librispeech test-other | 3.18 % |
| Librispeech dev-other | 3.06 % |
| WSJ Eval 92 | 1.40 % |
| WSJ Dev 93 | 2.20 % |
| Librispeech dev-clean | 1.48 % |
| Multilingual Librispeech dev (EN) | 5.26 % |
| Librispeech test-clean | 1.70 % |
| Multilingual Librispeech test (EN) | 6.02 % |
Model
| Key | Value |
|---|---|
| Encoder Dimension | 1024 |
| Number of Encoder Layers | 24 |
| Dataset | NeMo ASRSET 2.0 |
| ARCHITECTURE | Conformer-Transducer |
| Number of Predictor Layers | 2 |
| INPUTS | 16000 KHZ MONO-CHANNEL AUDIO (WAV FILES) |
| Number of Weights | 650M |
| OUTPUTS | TRANSCRIBED SPEECH |