This is a checkpoint for the QuartzNet 15x5 model that was trained in NeMo on six datasets: LibriSpeech, Mozilla Common Voice (validated clips from en_1488h_2019-12-10), WSJ, Fisher, Switchboard, and NSC Singapore English. It was trained with Apex/Amp optimization level O1 for 600 epochs.
The model achieves a WER of 3.79% on LibriSpeech dev-clean, and a WER of 10.05% on dev-other.
Please be sure to download the latest version in order to ensure compatibility with the latest NeMo release.
Source code and developer guide is available at https://github.com/NVIDIA/NeMo Refer to documentation at [https://docs.nvidia.com/deeplearning/nemo/neural-modules-release-notes/index.html]
python examples/asr/speech2text_infer.py --asr_model=QuartzNet15x5-En --dataset=test.json
You can also grab this model directly from your code by including this line:
asr_model = nemo_asr.models.ASRConvCTCModel.from_pretrained(model_info='QuartzNet15x5-En')