NGC | Catalog
CatalogContainersVoice Demo for Jetson/L4T

Voice Demo for Jetson/L4T

Logo for Voice Demo for Jetson/L4T
Description
ASR + BERT QA interactive chatbot demo for Jetson
Publisher
NVIDIA
Latest Tag
r32.4.2
Modified
April 1, 2024
Compressed Size
8.07 GB
Multinode Support
No
Multi-Arch Support
No
r32.4.2 (Latest) Security Scan Results

Linux / amd64

Sorry, your browser does not support inline SVG.

Voice Demo Container for Jetson

The jetson-voice container includes an interactive question/answering demo using Automatic Speech Recognition (ASR) and BERT QA, running locally on Jetson:

ASR/BERT Chatbot

The container has the following models:

  • QuartzNet: speech recognition, using Triton Inference Server for streaming
  • BERT-Base: sequence length 128, SQuAD 1.1 (default)
  • BERT-Large: sequence length 384, SQuAD 1.1 (optional)

All models run onboard Jetson with TensorRT. The container requires Jetson Xavier NX or Jetson AGX Xavier, along with JetPack 4.4 Developer Preview (L4T R32.4.2). You should also have a USB microphone attached if you wish to speak into the demo.

Running Deepstream People Detection Demo

Prerequisites

Ensure these prerequisites are available on your system:

  1. Jetson device running L4T r32.4.2

  2. JetPack 4.4 Developer Preview (DP)

Pulling the container

First, pull the container image - it's 8GB, so check that you have enough disk space first:

$ sudo docker pull nvcr.io/nvidia/jetson-voice:r32.4.2

Runnong the container

The voice demo uses two container instances, one that runs the Triton Inference Server for ASR and the other that runs BERT and the client application.

Launching Triton Container

First, launch the Triton Inference Server with this command:

$ sudo docker run --runtime nvidia -it --rm --network host \
    nvcr.io/nvidia/jetson-voice:r32.4.2 \
    trtserver --model-control-mode=none --model-repository=models/repository/jasper-asr-streaming-vad/

The server will load the QuartzNet ASR models, and when it's ready you should see log messages in the terminal like this:

I0508 15:55:31.529195 1 grpc_server.cc:1973] Started GRPCService at 0.0.0.0:8001
I0508 15:55:31.529294 1 http_server.cc:1443] Starting HTTPService at 0.0.0.0:8000
I0508 15:55:31.575083 1 http_server.cc:1458] Starting Metrics Service at 0.0.0.0:8002

You should leave the Triton server container instance running, so that the client is able to connect. Note that in this instance, both the server and client are running locally on the same machine. If you prefer to launch the Triton container instance in detached mode, use the -d flag with the sudo docker run command. You won't see the logging messages above in this case.

Launching Client Container

When the Triton server is running, open up another terminal window and run the following command to launch another container instance - this one has additional host devices mounted for display and audio input:

$ sudo xhost +si:localuser:root
$ sudo docker run --runtime nvidia -it --rm --network host -e DISPLAY=$DISPLAY \
    -v /tmp/.X11-unix/:/tmp/.X11-unix \
    --device /dev/bus/usb --device /dev/snd \
    nvcr.io/nvidia/jetson-voice:r32.4.2

This will bring you to an interactive terminal within the container, from which you can launch the following Python applications below:

  • ASR + BERT chatbot (src/chatbot.py)
  • ASR test program (src/test_asr.py)
  • BERT test program (src/test_bert.py)

Running Chatbot Demo

Before running the demo, you should verify the device ID of your USB microphone (which should be plugged in before you launch the container above). To list the audio input devices on your system, run the following command from within the container:

$ ./scripts/list_microphones.sh
...
Input Device ID 4 - jetson-xaviernx-ape: - (hw:1,0) (inputs=16) (sample_rate=44100)
Input Device ID 5 - jetson-xaviernx-ape: - (hw:1,1) (inputs=16) (sample_rate=44100)
Input Device ID 6 - jetson-xaviernx-ape: - (hw:1,2) (inputs=16) (sample_rate=44100)
Input Device ID 7 - jetson-xaviernx-ape: - (hw:1,3) (inputs=16) (sample_rate=44100)
Input Device ID 8 - jetson-xaviernx-ape: - (hw:1,4) (inputs=16) (sample_rate=44100)
Input Device ID 9 - jetson-xaviernx-ape: - (hw:1,5) (inputs=16) (sample_rate=44100)
Input Device ID 10 - jetson-xaviernx-ape: - (hw:1,6) (inputs=16) (sample_rate=44100)
Input Device ID 11 - jetson-xaviernx-ape: - (hw:1,7) (inputs=16) (sample_rate=44100)
Input Device ID 12 - jetson-xaviernx-ape: - (hw:1,8) (inputs=16) (sample_rate=44100)
Input Device ID 13 - jetson-xaviernx-ape: - (hw:1,9) (inputs=16) (sample_rate=44100)
Input Device ID 14 - jetson-xaviernx-ape: - (hw:1,10) (inputs=16) (sample_rate=44100)
Input Device ID 15 - jetson-xaviernx-ape: - (hw:1,11) (inputs=16) (sample_rate=44100)
Input Device ID 16 - jetson-xaviernx-ape: - (hw:1,12) (inputs=16) (sample_rate=44100)
Input Device ID 17 - jetson-xaviernx-ape: - (hw:1,13) (inputs=16) (sample_rate=44100)
Input Device ID 18 - jetson-xaviernx-ape: - (hw:1,14) (inputs=16) (sample_rate=44100)
Input Device ID 19 - jetson-xaviernx-ape: - (hw:1,15) (inputs=16) (sample_rate=44100)
Input Device ID 20 - jetson-xaviernx-ape: - (hw:1,16) (inputs=16) (sample_rate=44100)
Input Device ID 21 - jetson-xaviernx-ape: - (hw:1,17) (inputs=16) (sample_rate=44100)
Input Device ID 22 - jetson-xaviernx-ape: - (hw:1,18) (inputs=16) (sample_rate=44100)
Input Device ID 23 - jetson-xaviernx-ape: - (hw:1,19) (inputs=16) (sample_rate=44100)
Input Device ID 24 - Logitech H570e Mono: USB Audio (hw:2,0) (inputs=2) (sample_rate=44100)

In this case, the microphone's device ID is 24. Remember that ID number for when you launch the client below:

$ python3 src/chatbot.py --mic 24 --push-to-talk Space

Launch Arguments

  • --mic sets the microphone device ID. Run ./scripts/list_microphones.sh to view the active input audio device ID's on your system.
  • --push-to-talk sets the key that triggers the microphone to be live (hold down this key to speak). If not specified, by default the mic will always be live.
  • --para changes the BERT paragraph topic text file(s). It can be set to a directory (where all the .txt files under that directory will be loaded), or to an individual text file. By default, the included test topics will be used from test/passages/. If desired, you can create/edit the text (see below for more info about that), enabling BERT to answer questions about different topics that you provide.
  • --nlp-model sets the BERT QA model to use (BERT Base vs BERT Large). Accepted values are base and large (the default is base). If you are running the demo standalone (i.e. outside of the multi-container cloud native demo, BERT-Large will provide better question/answering accuracy)

Client Controls

When the chatbot GUI has loaded, you can use the following keyboard controls to interact with the demo:

  • Pressing the key bound to --push-to-talk (i.e. the spacebar as used in the run command above) will enable the hot mic, and start processing voice input from the user. BERT will answer at the end of a sentence. When the push-to-talk key is released, the mic will be muted.
  • The left/right arrow keys cycle through the topics.
  • The number keys activate the associated topic.
  • The escape key exits the demo.

Making Your Own Paragraph Topics

There are also UI controls for interatively editing the topic paragraphs, if you wish to create or modify the subject matter that BERT can answer questions about.

  • The Edit button will edit the current topic.
  • The New button will create a new blank topic that you can fill out.
  • The Load button will open a file selection dialog from which you can load a text file.
  • The Remove button will remove the current topic (but not delete it from disk)

Mounting Directories from the Host Device

The topic paragraph text files are accessed from within the container, so if you edit/save them inside the container, your changes will be lost after you shut down the container instance. Instead, if you wish to save your changes outside of the container, mount a directory on your host with Docker's -v flag argument when starting the container instance:

$ sudo xhost +si:localuser:root
$ sudo docker run --runtime nvidia -it --rm --network host -e DISPLAY=$DISPLAY \
    -v /tmp/.X11-unix/:/tmp/.X11-unix \
    --device /dev/bus/usb --device /dev/snd \
    -v /home/user/my/path:/location/in/container \
    nvcr.io/nvidia/jetson-voice:r32.4.2

You will then be able to navigate to that mounted directory in the file selection dialogs, or pass it to the --para argument when running chatbot.py.

Test Examples

There are a couple of scripts included in the container that run ASR/BERT individually, which you could run to test them independently or serve as examples to use in your own applications. You can look at the source of any of the Python scripts inside the container.

ASR

Streaming transcript of live microphone input or pre-recorded wav file:

$ python3 src/test_asr.py --mic 24
  • --mic sets the microphone device ID. Run ./scripts/list_microphones.sh to view the active input audio device ID's on your system.
    --wav runs the chat from an input audio WAV file instead of microphone (e.g. --wav test/dusty.wav). The WAV file should have 16KHz sample rate, mono channel.

BERT

Command-line question answering:

$ python3 src/test_bert.py --para test/gtc.txt
  • --para changes the BERT paragraph topic text file. It should be set to an individual text file. By default, it will use the test paragraph from test/gtc.txt. There is also more text files located at test/passages/*.txt.
  • --nlp-model sets the BERT QA model to use (BERT Base vs BERT Large). Accepted values are base and large (the default is base). If you are running the demo standalone (i.e. outside of the multi-container cloud native demo), BERT-Large will provide better question/answering accuracy.

Running the container as part of cloud native demo on Jetson

The cloud native demo on Jetson showcases how Jetson is bringing cloud native methodolgoies like containarizaton to the edge. The demo is built around the example use case of AI applications for service robots and show cases people detection, pose detection, gaze detection and natural language processing all running simultaneously as containers on Jetson.

Please follow for instructions in the jetson-cloudnative-demo repo on GitHub for running the voice container as part of the cloud native demo.

License

The jetson-voice container includes various software packages with their respective licenses included within the container.

Getting Help & Support

If you have any questions or need help, please visit the Jetson Developer Forums.