NGC | Catalog
CatalogResourcesNGram Language Model Notebook

NGram Language Model Notebook

For downloads and more information, please view on a desktop device.
Logo for NGram Language Model Notebook

Description

End to End sample workflow for N Gram language model starting with training in TAO Toolkit and deployment using Riva.

Publisher

NVIDIA

Use Case

Other

Framework

TransferLearningToolkit

Latest Version

v1.0

Modified

April 8, 2022

Compressed Size

26.81 KB

N-Gram Language Model Notebook

LM, or Language model, Language Models estimate the probability distriubtion of sequences of words. In general, this is a large task, with arbitrary sequence lengths, so it is often assumed that the probability of a word is only dependent on the N words preceding it. This is known as an N-Gram Language Model. An N-Gram model of order N saves the counts of all observed sequences of words in the training data of lengths one (known as unigrams) to lengths N. During inference, if an N-gram sequence not seen during training is queried, the sequence is then simplified to the probability of the N-1 last words, weighted by a calculated backoff probability.

The best place to get started with TAO Toolkit - LM would be the TAO - N-Gram LM jupyter notebooks sample enclosed in this sample.

This resource has 1 notebook included.

  1. Training: Sample workflow for training an ASR model and export the model to a .riva file

If you are a seasoned Conversation AI developer we recommend installing TAO and referring to the TAO documentation for detailed information.

Pre-Requisites

Please make sure to install the following before proceeding further:

  • python 3.6.9
  • docker-ce > 19.03.5
  • docker-API 1.40
  • nvidia-container-toolkit > 1.3.0-1
  • nvidia-container-runtime > 3.4.0-1
  • nvidia-docker2 > 2.5.0-1
  • nvidia-driver >= 455.23

Note: A compatible NVIDIA GPU would be required.

Installation

We recommend that you install TAO Toolkit inside a virtual environment. The steps to do the same are as follows

virtualenv -p python3 
source /bin/activate
pip install jupyter notebook # If you need to run the notebooks

TAO Toolkit is a python package that is hosted in nvidia python package index. You may install by using python’s package manager, pip.

pip install nvidia-pyindex
pip install nvidia-tao

To download the jupyter notebook please:

  1. Download the samples using the ngc cli with the following command
ngc registry resource download-version "nvidia/tao/ngram_lm_notebook:v1.0"
  1. Instantiate the jupyter notebook server
jupyter notebook --ip 0.0.0.0 --allow-root --port 8888

License

By downloading and using the models and resources packaged with TAO Toolkit Conversational AI, you would be accepting the terms of the Riva license