NGC | Catalog
CatalogModelsQuartzNet checkpoint (PyTorch, AMP, LibriSpeech)

QuartzNet checkpoint (PyTorch, AMP, LibriSpeech)

For downloads and more information, please view on a desktop device.
Logo for QuartzNet checkpoint (PyTorch, AMP, LibriSpeech)

Description

QuartzNet PyTorch checkpoint trained on LibriSpeech (test-other 10.41% WER)

Publisher

NVIDIA Deep Learning Examples

Latest Version

21.03.0_amp

Modified

April 4, 2023

Size

72.72 MB

Model Overview

End-to-end neural acoustic model for automatic speech recognition providing high accuracy at a low memory footprint.

Model Architecture

QuartzNet is an end-to-end neural acoustic model that is based on efficient, time-channel separable convolutions (Figure 1). In the audio processing stage, each frame is transformed into mel-scale spectrogram features, which the acoustic model takes as input and outputs a probability distribution over the vocabulary for each frame.

QuartzNet model architecture

Figure 1. Architecture of QuartzNet (source)

Training

This model was trained using script available on NGC and in GitHub repo.

Dataset

The following datasets were used to train this model:

  • LibriSpeech - Corpus of approximately 1000 hours of 16kHz read English speech derived from audiobooks from the LibriVox project, carefully segmented and aligned.

Performance

Performance numbers for this model are available in NGC.

References

License

This model was trained using open-source software available in Deep Learning Examples repository. For terms of use, please refer to the license of the script and the datasets the model was derived from.