NGC | Catalog
CatalogModelsQuartzNet15x5: WSJ, LibriSpeech & MCV

QuartzNet15x5: WSJ, LibriSpeech & MCV

Logo for QuartzNet15x5: WSJ, LibriSpeech & MCV
Description
QuartzNet15x5 model trained on WSJ, LibriSpeech and Mozilla's Common Voice En with NeMo
Publisher
NVIDIA
Latest Version
2
Modified
April 4, 2023
Size
72.6 MB

Overview

QuarzNet is a end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more modules with 1D time-channel separable convolutional layers, batch normalization, and ReLU layers and it is trained with CTC loss. QuartzNet is a Jasper-like network which uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters.

We provide a QuartzNet model pre-trained on WSJ, LibriSpeech and Mozilla's Common Voice En. Specifically, we fine-tune the pre-trained QuartzNet model available in NGC with Wall Street Journal data (CSR-I (WSJ0) Complete and CSR-II (WSJ1) Complete).

Datasets

  • Pre-trained on: LibriSpeech +- 10% speed perturbation and Mozilla’s Common Voice En (Validated).
  • Fine-tuned on: Wall Street Journal +- 10% speed perturbation (CSR-I (WSJ0) Complete and CSR-II (WSJ1) Complete).

Word Error Rate

  • WSJ eval-92: (Greedy) 4.45%, (+ 6-Gram WSJ Language Model) 2.39%
  • WSJ dev-93: (Greedy) 6.59%, (+ 6-Gram WSJ Language Model) 3.76%