QuarzNet is a end-to-end neural acoustic model for automatic speech recognition. The model is composed of multiple blocks with residual connections between them. Each block consists of one or more modules with 1D time-channel separable convolutional layers, batch normalization, and ReLU layers and it is trained with CTC loss. QuartzNet is a Jasper-like network which uses separable convolutions and larger filter sizes. It has comparable accuracy to Jasper while having much fewer parameters.
We provide a QuartzNet model pre-trained on WSJ, LibriSpeech and Mozilla's Common Voice En. Specifically, we fine-tune the pre-trained QuartzNet model available in NGC with Wall Street Journal data (CSR-I (WSJ0) Complete and CSR-II (WSJ1) Complete).