End to End sample workflow for N Gram language model starting with training in TAO Toolkit and deployment using Riva.
N-Gram Language Model Notebook
LM, or Language model, Language Models estimate the probability distriubtion of sequences of words. In general, this is a large task, with arbitrary sequence lengths, so it is often assumed that the probability of a word is only dependent on the N words preceding it. This is known as an N-Gram Language Model. An N-Gram model of order N saves the counts of all observed sequences of words in the training data of lengths one (known as unigrams) to lengths N. During inference, if an N-gram sequence not seen during training is queried, the sequence is then simplified to the probability of the N-1 last words, weighted by a calculated backoff probability.
The best place to get started with TAO Toolkit - LM would be the TAO - N-Gram LM jupyter notebooks sample enclosed in this sample.
This resource has 1 notebook included.
- Training: Sample workflow for training an ASR model and export the model to a
.rivafile
If you are a seasoned Conversation AI developer we recommend installing TAO and referring to the TAO documentation for detailed information.
Pre-Requisites
Please make sure to install the following before proceeding further:
- python 3.6.9
- docker-ce > 19.03.5
- docker-API 1.40
- nvidia-container-toolkit > 1.3.0-1
- nvidia-container-runtime > 3.4.0-1
- nvidia-docker2 > 2.5.0-1
- nvidia-driver >= 455.23
Note: A compatible NVIDIA GPU would be required.
Installation
We recommend that you install TAO Toolkit inside a virtual environment. The steps to do the same are as follows
TAO Toolkit is a python package that is hosted in nvidia python package index. You may install by using python’s package manager, pip.
To download the jupyter notebook please:
- Download the samples using the ngc cli with the following command
- Instantiate the jupyter notebook server
License
By downloading and using the models and resources packaged with TAO Toolkit Conversational AI, you would be accepting the terms of the Riva license