NGC Catalog

CLASSIC

Welcome Guest

For contents of this collection and more information, please view on a desktop device.

Description

NeMo is a toolkit for creating Conversational AI applications. NeMo toolkit makes it possible for researchers to easily compose complex neural network architectures for conversational AI using reusable components - Neural Modules

Curator

NVIDIA

Modified

March 14, 2025

Containers

Helm Charts

Models

Resources

What is NVIDIA NeMo for Conversational AI?

NVIDIA NeMo is an open source toolkit for conversational AI. It is built for data scientists and researchers to build new state of the art ASR (Automatic Speech Recognition), NLP(Natural Language Processing) and TTS(Text to speech synthesis) networks easily through API compatible building blocks that can be connected together.

�Neural Modules� are conceptual blocks that take typed inputs and produce typed outputs. NeMo makes it easy to combine and re-use these building blocks while providing a level of semantic correctness checking via its neural type system. Conversational AI architectures are typically very large and require a lot of data and compute for training. Built for speed, NeMo can utilize NVIDIA's Tensor Cores and scale out training to multiple GPUs and multiple nodes.NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility.

What is in this Collection?

This NGC collection includes ready-to-use models for Automatic Speech Recognition (ASR), Natural Language Processing (NLP) and Speech Synthesis (TTS). Any of these pre-trained models can be used with the NeMo toolkit to build applications that work with domain specific data for speech and nlp. NeMo itself contains the concept of collections of modules where ASR, NLP and TTS modules are separately available inside the toolkit

Pull the latest NeMo Container to get started! The resources under �Getting Started� will lead you towards using the software toolkit and pretrained models. The in depth tutorials for usage of these models for each of the above domains have detailed explanation of the workflow in the discussed task - usually involving a small amount of data to train a small model on a task, along with some explanation of the task itself. While the tutorials are a great example of the simplicity of NeMo, please note for the best performance when training on real datasets, we advise the use of the example scripts instead of the tutorial notebooks.

Several pretrained models in the form of Pytorch checkpoints packaged as nemo files are provided. Models trained with NeMo are high accuracy and trained on multiple datasets. The overview pages in each collection show details of all datasets used and accuracy achieved.

The following Pretrained Models are provided in this collection:

Speech Recognition (ASR, speech to text) models: See individual models overview pages for description of each model. Popular models include Jasper, Quartznet, MatchboxNet
Natural Language Processing (NLP) models. Popular models here are BERT base and BERT large fine tuned for several tasks such as Question Answering, Named Entity Recognition and many more. Bio-Megatron is a state of the art model for medical data
Text to speech synthesis (TTS) models such as Tacotron2, Waveglow, GlowTTS

Getting Started

To quickly get started building and training Conversational AI, NeMo provides several Jupyter Notebook examples

See table below
Run any of these notebooks individually on a machine with GPUs accessible

Domain	Title	GitHub URL
NeMo	Simple Application with NeMo	Voice swap app
NeMo	Exploring NeMo Fundamentals	NeMo primer
NeMo Models	Exploring NeMo Model Construction	NeMo models
ASR	ASR with NeMo	ASR with NeMo
ASR	Speech Commands	Speech commands
ASR	Speaker Recognition and Verification	Speaker Recognition and Verification
ASR	Online Noise Augmentation	Online noise augmentation
NLP	Using Pretrained Language Models for Downstream Tasks	[Pretrained language models for downstream tasks](https://github/NVIDIA/NeMo/blob/main/tutorials/nlp/01_Pretrained_Language_Models_for_Downstream_Tasks.ipynb
NLP	Exploring NeMo NLP Tokenizers	NLP tokenizers
NLP	Text Classification (Sentiment Analysis) with BERT	Text Classification (Sentiment Analysis)
NLP	Question answering with SQuAD	Question answering Squad
NLP	Token Classification (Named Entity Recognition)	Token classification: named entity recognition
NLP	GLUE Benchmark	GLUE benchmark
NLP	Punctuation and Capitialization	Punctuation and capitalization
NLP	Named Entity Recognition - BioMegatron	Named Entity Recognition - BioMegatron
NLP	Relation Extraction - BioMegatron	Relation Extraction - BioMegatron
TTS	Speech Synthesis	TTS inference

What's new in Release 1.0beta?

This release updates core training api with Pytorch Lightning. Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training.

NeMo, Pytorch Lightning, and Hydra makes all NeMo models have the same look and feel so that it is easy to do Conversational AI research across multiple domains.

New models such as Speaker Identification and Megatron BERT provide variety. Together with the collection and docker container, we believe NeMo is on track to become a premier toolkit for Conversational AI model building and training.

Documentation

NeMo developer guide is available at https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/index.html

Hardware Requirements

GPUs in the Pascal, Volta, Turing and A100 families

Driver Requirements

NeMo development is based on NVIDIA's PyTorch container version 20.08-py3

License

NeMo is licensed under Apache License 2.0 Link Here. By pulling and using the container and models, you accept the terms and conditions of these licenses.

Technical Support

Use the Github Issues forum for questions regarding this Software