NGC | Catalog
Logo for NeMo
NeMo is a toolkit for creating Conversational AI applications. NeMo toolkit makes it possible for researchers to easily compose complex neural network architectures for conversational AI using reusable components - Neural Modules
April 4, 2023
Sorry, your browser does not support inline SVG.
Helm Charts
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.
Sorry, your browser does not support inline SVG.

What is NVIDIA NeMo for Conversational AI?

NVIDIA NeMo is an open source toolkit for conversational AI. It is built for data scientists and researchers to build new state of the art ASR (Automatic Speech Recognition), NLP(Natural Language Processing) and TTS(Text to speech synthesis) networks easily through API compatible building blocks that can be connected together.

�Neural Modules� are conceptual blocks that take typed inputs and produce typed outputs. NeMo makes it easy to combine and re-use these building blocks while providing a level of semantic correctness checking via its neural type system. Conversational AI architectures are typically very large and require a lot of data and compute for training. Built for speed, NeMo can utilize NVIDIA's Tensor Cores and scale out training to multiple GPUs and multiple nodes.NeMo uses PyTorch Lightning for easy and performant multi-GPU/multi-node mixed-precision training. Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility.

What is in this Collection?

This NGC collection includes ready-to-use models for Automatic Speech Recognition (ASR), Natural Language Processing (NLP) and Speech Synthesis (TTS). Any of these pre-trained models can be used with the NeMo toolkit to build applications that work with domain specific data for speech and nlp. NeMo itself contains the concept of collections of modules where ASR, NLP and TTS modules are separately available inside the toolkit

  1. Pull the latest NeMo Container to get started! The resources under �Getting Started� will lead you towards using the software toolkit and pretrained models. The in depth tutorials for usage of these models for each of the above domains have detailed explanation of the workflow in the discussed task - usually involving a small amount of data to train a small model on a task, along with some explanation of the task itself. While the tutorials are a great example of the simplicity of NeMo, please note for the best performance when training on real datasets, we advise the use of the example scripts instead of the tutorial notebooks.

Several pretrained models in the form of Pytorch checkpoints packaged as nemo files are provided. Models trained with NeMo are high accuracy and trained on multiple datasets. The overview pages in each collection show details of all datasets used and accuracy achieved.

The following Pretrained Models are provided in this collection:

  1. Speech Recognition (ASR, speech to text) models: See individual models overview pages for description of each model. Popular models include Jasper, Quartznet, MatchboxNet
  2. Natural Language Processing (NLP) models. Popular models here are BERT base and BERT large fine tuned for several tasks such as Question Answering, Named Entity Recognition and many more. Bio-Megatron is a state of the art model for medical data
  3. Text to speech synthesis (TTS) models such as Tacotron2, Waveglow, GlowTTS

Getting Started

To quickly get started building and training Conversational AI, NeMo provides several Jupyter Notebook examples

  1. See table below
  2. Run any of these notebooks individually on a machine with GPUs accessible
Domain Title GitHub URL
NeMo Simple Application with NeMo Voice swap app
NeMo Exploring NeMo Fundamentals NeMo primer
NeMo Models Exploring NeMo Model Construction NeMo models
ASR ASR with NeMo ASR with NeMo
ASR Speech Commands Speech commands
ASR Speaker Recognition and Verification Speaker Recognition and Verification
ASR Online Noise Augmentation Online noise augmentation
NLP Using Pretrained Language Models for Downstream Tasks [Pretrained language models for downstream tasks](https://github/NVIDIA/NeMo/blob/main/tutorials/nlp/01_Pretrained_Language_Models_for_Downstream_Tasks.ipynb
NLP Exploring NeMo NLP Tokenizers NLP tokenizers
NLP Text Classification (Sentiment Analysis) with BERT Text Classification (Sentiment Analysis)
NLP Question answering with SQuAD Question answering Squad
NLP Token Classification (Named Entity Recognition) Token classification: named entity recognition
NLP GLUE Benchmark GLUE benchmark
NLP Punctuation and Capitialization Punctuation and capitalization
NLP Named Entity Recognition - BioMegatron Named Entity Recognition - BioMegatron
NLP Relation Extraction - BioMegatron Relation Extraction - BioMegatron
TTS Speech Synthesis TTS inference

What's new in Release 1.0beta?

This release updates core training api with Pytorch Lightning. Every NeMo model is a LightningModule that comes equipped with all supporting infrastructure for training and reproducibility. Every NeMo model has an example configuration file and a corresponding script that contains all configurations needed for training.

NeMo, Pytorch Lightning, and Hydra makes all NeMo models have the same look and feel so that it is easy to do Conversational AI research across multiple domains.

New models such as Speaker Identification and Megatron BERT provide variety. Together with the collection and docker container, we believe NeMo is on track to become a premier toolkit for Conversational AI model building and training.


NeMo developer guide is available at

Hardware Requirements

GPUs in the Pascal, Volta, Turing and A100 families

Driver Requirements

NeMo development is based on NVIDIA's PyTorch container version 20.08-py3


NeMo is licensed under Apache License 2.0 Link Here. By pulling and using the container and models, you accept the terms and conditions of these licenses.

Technical Support

Use the Github Issues forum for questions regarding this Software