NGC | Catalog
CatalogModelsRIVA Punctuation

RIVA Punctuation

Logo for RIVA Punctuation
Features
Description
For each word in the input text, the model: predicts a punctuation mark that should follow the word (if any).
Publisher
NVIDIA
Latest Version
trainable_v2.1
Modified
October 6, 2023
Size
639.14 MB

Punctuation and Capitalization Model Card

Model Overview

Automatic Speech Recognition (ASR) systems typically generate text with no punctuation and capitalization of the words. Besides being hard to read, the ASR output could be an input to named entity recognition, machine translation or text-to-speech models. If the input text has punctuation and words are capitalized correctly, this could potentially boost the performance of such models.

Intended Use

For each word in the input text, the model:

predicts a punctuation mark that should follow the word (if any). The model supports commas, periods and question marks. predicts if the word should be capitalized or not.

Model Architecture

The Punctuation and Capitalization model consists of the pre-trained Bidirectional Encoder Representations from Transformers (BERT) followed by two token classification heads. One classification head is responsible for the punctuation task, the other one handles the capitalization task. Both token level classification heads take the BERT encoded representation of the [CLS] token as input. Such architecture allows this model to solve two tasks at once with only a single pass through the BERT. Finally, all the parameters are fine-tuned on this joint task.

Limitations

The punctuation model currently supports only commas, periods and question marks.

Training

Dataset

The model was trained with BERT base multilingual cased checkpoint on a subset of data from the following sources:

  1. Tatoeba sentences
  2. MCV Corpus
  3. Proprietary datasets.

Performance

Evaluation

Each word in the input sequence could be split into one or more tokens, as a result, there are two possible ways of the model evaluation: (1) marking the whole entity as a single label (2) perform evaluation on the sub token level

During training, the first approach was applied, and the predictions for the first token of the input were used to label the whole word. Each task is evaluated separately. Due to the high class unbalancing, the suggested metric for this model is F1 score (with macro averaging).

This model was evaluated on an internal dataset, and it reached the F1 score of 88%.

How to Use this Model

This pre-trained model needs to be used with NVIDIA Hardware and Software. This model can be used with Train Adapt Optimize (TAO) Toolkit with model load key: tlt_encode (Please make sure to use this as the key for all TAO commands that require a model load key).

For more details on how to prepare your data for this model and how to fine-tune the model with TAO, see TAO user guide for Punctuation and Capitalization model.

References

Citations

Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

Suggested Reading

More information about the TAO Toolkit can be found at the NVIDIA Developer Zone: https://developer.nvidia.com/tao-toolkit. Read the TAO getting Started guide and release notes. More information about the experiment spec files can be found in the TAO User Guide. More information about deploying the model with Riva Services see this. If you have any questions or feedback, please refer to the discussions on TAO Toolkit Developer Forums

License

By downloading and using the models and resources packaged with TAO Conversational AI, you would be accepting the terms of the Riva license

Ethical AI

NVIDIA’s platforms and application frameworks enable developers to build a wide array of AI applications. Consider potential algorithmic bias when choosing or creating the models being deployed. Work with the model’s developer to ensure that it meets the requirements for the relevant industry and use case; that the necessary instruction and documentation are provided to understand error rates, confidence intervals, and results; and that the model is being used under the conditions and in the manner intended.