WSJ 6-Gram Language Model

NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

Trained Language Model for Automatic Speech Recognition: Baidu’s CTC decoder with 6-Gram Language Model trained on WSJ corpus.

Publisher

NVIDIA

Latest Version

Modified

April 4, 2023

Size

590.59 MB

Overview

Transcripts in Automatic Speech Recognition systems are commonly generated using only an acoustic model, typically an “end-to-end” CTC-based network which matches audio and text without additional alignment information. However, ambiguities in the transcription, for example when collapsing repeated characters and removing blanks, can exist as the CTC-based network has little prior linguistic knowledge. This is where a language model comes in, as it can improve performance by helping solve those decoding ambiguities.

We provide a trained language model on Wall Street Journal data. The language model we use is based on prefix beam search KenLM which imposes a language model constraint on the new predicted character based on previous (most probable) prefixes. Specifically, the model we trained is the Baidu’s CTC decoder with N-Gram LM implementation using 6 as N and train on WSJ data.

Datasets

Wall Street Journal sentences from CSR-I (WSJ0) Complete and CSR-II (WSJ1) Complete.

Word Error Rate

WSJ eval-92: (+ 6-Gram WSJ Language Model) 2.39%
WSJ dev-93: (+ 6-Gram WSJ Language Model) 3.76%