NGC | Catalog


For contents of this collection and more information, please view on a desktop device.
Logo for Jarvis


NVIDIA Jarvis is a framework for production-grade conversational AI inference. The Jarvis Collection on NGC includes all the resources required for getting started with Jarvis.




November 19, 2021
Helm Charts

What Is Jarvis?

NVIDIA Jarvis is an application framework for multimodal conversational AI services that delivers real-time performance on GPUs. The Jarvis framework includes pre-trained conversational AI models, tools in the NVIDIA AI Toolkit, and optimized end-to-end services for speech and natural language understanding (NLU) tasks.

To get started with Jarvis in a few easy steps, go to Jarvis Quick Start.

What's Included In The Jarvis Collection?


Jarvis Speech Server: Jarvis Speech Skills is a Docker image containing a toolkit for production-grade conversational AI inference. The Jarvis Speech API server exposes a simple API for performing speech recognition, speech synthesis, and a variety of natural language processing inferences. Pull and run the container by using the commands below. No GPU is required to run the sample clients.

Jarvis Speech Client: Jarvis Speech Clients is a Docker image containing sample command-line drivers for the Jarvis services. Pull and run the container by using the commands below. The client expects that a Jarvis server is running with models deployed, and all command-line drivers accept an optional argument to specify the location of the server. No GPU is required to run the sample clients.


Jarvis Quick Start Scripts: NVIDIA Jarvis includes Quick Start scripts to help you get started with Jarvis AI Services. These scripts are meant for deploying the services locally for testing and running our example applications.

Helm Charts

Jarvis Speech Skills Helm chart: Can be used to deploy ASR, NLP, and TTS services automatically. Specifically, it is designed to automate the steps for push-button deployment to a Kubernetes cluster.

Getting Started With Jarvis


  1. In order to use Jarvis AI Services, you are required to be logged in to NVIDIA GPU Cloud (NGC).

  2. You have access to a Volta, Turing, or an NVIDIA Ampere architecture-based GPU.


  1. Download Jarvis quick start scripts via the command-line with the NGC CLI tool by running the following CLI command or selecting the File Browser tab to download the scripts.

    ngc registry resource download-version "nvidia/jarvis/jarvis_quickstart:1.2.1-beta"
  2. Initialize and start Jarvis. The initialization step downloads and prepares Docker images and models. The start script launches the server.

Note: This process may take quite some time depending on the speed of your Internet connection and the number of models deployed. Each model is individually optimized for the target GPU after download.


cd jarvis_quickstart_v1.2.1-beta



3. Start a container with sample clients for each service.
  1. From inside the client container, try the different services using the provided Jupyter notebooks.

jupyter notebook --ip= --allow-root --notebook-dir=/work/notebooks

Within the quickstart directory, you can modify the file with your preferred configuration. Options include which models to retrieve from NGC, where to store them, and which GPU to use if more than one is installed in your system (see Local (Docker) for more details).

Jarvis AI Services

The Jarvis AI Services are a set of gRPC endpoints that implement specific functional building blocks for conversational AI. Jarvis includes:

Jarvis ASR: Takes as input an audio stream or audio buffer and returns the English language text transcript, along with additional optional metadata. This works in both streaming and non-streaming (batch) modes.

Jarvis NLP: Takes as input text and performs a number of analysis algorithms. This includes a named entity recognition model, and a model to transform from text without punctuation to text with punctuation by adding periods, commas, question marks, and capitalization. Jarvis NLP also has an Analyze-Intent method which sequences calls to multiple models: a domain model for selecting which intent model to use and domain-specific intent models that return both intent labels and performs slot filling, returning regions of the query that may be required for fulfillment purposes. In all cases, the models provided may be retrained with NVIDIA TLT to tailor performance/labels to your domain.

Jarvis Core NLP: This is intended to be used to support custom models trained on user’s data via TLT/NeMo. The Jarvis Core NLP service supports three basic operations:

  • per-sentence classification (examples include sentiment analysis, intent recognition, abusive language detection)

  • per-word classification (examples include slot filling, named entity recognition, personally-identifying information detection)

  • text transformation (examples include machine translation, text punctuation, and spelling correction)

These three basic operations support a wide variety of practical use cases. The Jarvis NLP services are built as special cases of Jarvis Core NLP services using pre-trained models.

Jarvis TTS: Takes as input an English-language text string, one or a set of speakers, and returns an audio waveform of the given voice speaking this string. This only supports a batch mode of operation where the system will return the entire audio file once it is complete. In future versions, Jarvis will support a streaming response mode that sends the audio waveform incrementally as it is being computed.

The Jarvis AI Services can be configured at launch time by adding new models via a simple setting file. Helm chart for easy deployment and scaling with Kubernetes is provided, as well as guidance for launching the services locally for testing. This tool consists of a deployment service that uses one or more charts to describe a set of related tasks.

In its standard configuration, Jarvis runs as two separate processes: the API Server which the client should directly connect to, and a Triton Inference Server instance that performs the inferencing requests.

  • TLT Conversational AI - Transfer Learning Toolkit (TLT) is a python based AI toolkit for taking purpose-built pre-trained Conv AI models and customizing them with your own data.The fine tuned models can be easily exported to Jarvis for inference.

  • Speech Recognition

    • Supported models: Jasper ASR English trained on ASR, QuartzNet ASR English trained on ASR, and ASR English Language Model
    • Included resources: Speech Recognition Training Notebook and Speech Recognition Deployment Notebook
  • Intent Detection and Slot Tagging

  • Supported models: Intent Slot

    • Included resources: Intent Slot Training Notebook and Intent Slot Deployment Notebook
  • Named Entity Recognition

  • Supported models: Named Entity Recognition (NER)

    • Included resources: NER Training Notebook and NER Deployment Notebook
  • Punctuation and Capitalization

  • Supported models: Punctuation

    • Included resources: Punctuation And Capitalization Training Notebook and Punctuation And Capitalization Deployment Notebook
  • Question Answering

  • Supported models: Question Answering (BERT) and Question Answering (Megatron)

    • Included resources: Question Answering Training Notebook and Question Answering Deployment Notebook
  • Text Classification

  • Supported models: Text Classification (Domain Classification-Weather)

    • Included resources: Text Classification Training Notebook and Text Classification Deployment Notebook
  • Speech Synthesis

  • Supported models: Tacotron 2 and WaveGlow


By pulling and using Jarvis software, you accept the terms and conditions of this license.

Technical Blogs

Webinars And Videos