Linux / amd64
The jetson-voice
container includes an interactive question/answering demo using Automatic Speech Recognition (ASR) and BERT QA, running locally on Jetson:
The container has the following models:
All models run onboard Jetson with TensorRT. The container requires Jetson Xavier NX or Jetson AGX Xavier, along with JetPack 4.4 Developer Preview (L4T R32.4.2). You should also have a USB microphone attached if you wish to speak into the demo.
Ensure these prerequisites are available on your system:
Jetson device running L4T r32.4.2
JetPack 4.4 Developer Preview (DP)
First, pull the container image - it's 8GB, so check that you have enough disk space first:
$ sudo docker pull nvcr.io/nvidia/jetson-voice:r32.4.2
The voice demo uses two container instances, one that runs the Triton Inference Server for ASR and the other that runs BERT and the client application.
First, launch the Triton Inference Server with this command:
$ sudo docker run --runtime nvidia -it --rm --network host \
nvcr.io/nvidia/jetson-voice:r32.4.2 \
trtserver --model-control-mode=none --model-repository=models/repository/jasper-asr-streaming-vad/
The server will load the QuartzNet ASR models, and when it's ready you should see log messages in the terminal like this:
I0508 15:55:31.529195 1 grpc_server.cc:1973] Started GRPCService at 0.0.0.0:8001
I0508 15:55:31.529294 1 http_server.cc:1443] Starting HTTPService at 0.0.0.0:8000
I0508 15:55:31.575083 1 http_server.cc:1458] Starting Metrics Service at 0.0.0.0:8002
You should leave the Triton server container instance running, so that the client is able to connect. Note that in this instance, both the server and client are running locally on the same machine. If you prefer to launch the Triton container instance in detached mode, use the -d
flag with the sudo docker run
command. You won't see the logging messages above in this case.
When the Triton server is running, open up another terminal window and run the following command to launch another container instance - this one has additional host devices mounted for display and audio input:
$ sudo xhost +si:localuser:root
$ sudo docker run --runtime nvidia -it --rm --network host -e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix/:/tmp/.X11-unix \
--device /dev/bus/usb --device /dev/snd \
nvcr.io/nvidia/jetson-voice:r32.4.2
This will bring you to an interactive terminal within the container, from which you can launch the following Python applications below:
src/chatbot.py
)src/test_asr.py
)src/test_bert.py
)Before running the demo, you should verify the device ID of your USB microphone (which should be plugged in before you launch the container above). To list the audio input devices on your system, run the following command from within the container:
$ ./scripts/list_microphones.sh
...
Input Device ID 4 - jetson-xaviernx-ape: - (hw:1,0) (inputs=16) (sample_rate=44100)
Input Device ID 5 - jetson-xaviernx-ape: - (hw:1,1) (inputs=16) (sample_rate=44100)
Input Device ID 6 - jetson-xaviernx-ape: - (hw:1,2) (inputs=16) (sample_rate=44100)
Input Device ID 7 - jetson-xaviernx-ape: - (hw:1,3) (inputs=16) (sample_rate=44100)
Input Device ID 8 - jetson-xaviernx-ape: - (hw:1,4) (inputs=16) (sample_rate=44100)
Input Device ID 9 - jetson-xaviernx-ape: - (hw:1,5) (inputs=16) (sample_rate=44100)
Input Device ID 10 - jetson-xaviernx-ape: - (hw:1,6) (inputs=16) (sample_rate=44100)
Input Device ID 11 - jetson-xaviernx-ape: - (hw:1,7) (inputs=16) (sample_rate=44100)
Input Device ID 12 - jetson-xaviernx-ape: - (hw:1,8) (inputs=16) (sample_rate=44100)
Input Device ID 13 - jetson-xaviernx-ape: - (hw:1,9) (inputs=16) (sample_rate=44100)
Input Device ID 14 - jetson-xaviernx-ape: - (hw:1,10) (inputs=16) (sample_rate=44100)
Input Device ID 15 - jetson-xaviernx-ape: - (hw:1,11) (inputs=16) (sample_rate=44100)
Input Device ID 16 - jetson-xaviernx-ape: - (hw:1,12) (inputs=16) (sample_rate=44100)
Input Device ID 17 - jetson-xaviernx-ape: - (hw:1,13) (inputs=16) (sample_rate=44100)
Input Device ID 18 - jetson-xaviernx-ape: - (hw:1,14) (inputs=16) (sample_rate=44100)
Input Device ID 19 - jetson-xaviernx-ape: - (hw:1,15) (inputs=16) (sample_rate=44100)
Input Device ID 20 - jetson-xaviernx-ape: - (hw:1,16) (inputs=16) (sample_rate=44100)
Input Device ID 21 - jetson-xaviernx-ape: - (hw:1,17) (inputs=16) (sample_rate=44100)
Input Device ID 22 - jetson-xaviernx-ape: - (hw:1,18) (inputs=16) (sample_rate=44100)
Input Device ID 23 - jetson-xaviernx-ape: - (hw:1,19) (inputs=16) (sample_rate=44100)
Input Device ID 24 - Logitech H570e Mono: USB Audio (hw:2,0) (inputs=2) (sample_rate=44100)
In this case, the microphone's device ID is 24
. Remember that ID number for when you launch the client below:
$ python3 src/chatbot.py --mic 24 --push-to-talk Space
--mic
sets the microphone device ID. Run ./scripts/list_microphones.sh
to view the active input audio device ID's on your system.--push-to-talk
sets the key that triggers the microphone to be live (hold down this key to speak). If not specified, by default the mic will always be live.--para
changes the BERT paragraph topic text file(s). It can be set to a directory (where all the .txt
files under that directory will be loaded), or to an individual text file. By default, the included test topics will be used from test/passages/
. If desired, you can create/edit the text (see below for more info about that), enabling BERT to answer questions about different topics that you provide.--nlp-model
sets the BERT QA model to use (BERT Base vs BERT Large). Accepted values are base
and large
(the default is base
). If you are running the demo standalone (i.e. outside of the multi-container cloud native demo, BERT-Large will provide better question/answering accuracy)When the chatbot GUI has loaded, you can use the following keyboard controls to interact with the demo:
--push-to-talk
(i.e. the spacebar as used in the run command above) will enable the hot mic, and start processing voice input from the user. BERT will answer at the end of a sentence. When the push-to-talk key is released, the mic will be muted.There are also UI controls for interatively editing the topic paragraphs, if you wish to create or modify the subject matter that BERT can answer questions about.
Edit
button will edit the current topic.New
button will create a new blank topic that you can fill out.Load
button will open a file selection dialog from which you can load a text file.Remove
button will remove the current topic (but not delete it from disk)The topic paragraph text files are accessed from within the container, so if you edit/save them inside the container, your changes will be lost after you shut down the container instance. Instead, if you wish to save your changes outside of the container, mount a directory on your host with Docker's -v
flag argument when starting the container instance:
$ sudo xhost +si:localuser:root
$ sudo docker run --runtime nvidia -it --rm --network host -e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix/:/tmp/.X11-unix \
--device /dev/bus/usb --device /dev/snd \
-v /home/user/my/path:/location/in/container \
nvcr.io/nvidia/jetson-voice:r32.4.2
You will then be able to navigate to that mounted directory in the file selection dialogs, or pass it to the --para
argument when running chatbot.py
.
There are a couple of scripts included in the container that run ASR/BERT individually, which you could run to test them independently or serve as examples to use in your own applications. You can look at the source of any of the Python scripts inside the container.
Streaming transcript of live microphone input or pre-recorded wav file:
$ python3 src/test_asr.py --mic 24
--mic
sets the microphone device ID. Run ./scripts/list_microphones.sh
to view the active input audio device ID's on your system.--wav
runs the chat from an input audio WAV file instead of microphone (e.g. --wav test/dusty.wav
). The WAV file should have 16KHz sample rate, mono channel.Command-line question answering:
$ python3 src/test_bert.py --para test/gtc.txt
--para
changes the BERT paragraph topic text file. It should be set to an individual text file. By default, it will use the test paragraph from test/gtc.txt
. There is also more text files located at test/passages/*.txt
.--nlp-model
sets the BERT QA model to use (BERT Base vs BERT Large). Accepted values are base
and large
(the default is base
). If you are running the demo standalone (i.e. outside of the multi-container cloud native demo), BERT-Large will provide better question/answering accuracy.The cloud native demo on Jetson showcases how Jetson is bringing cloud native methodolgoies like containarizaton to the edge. The demo is built around the example use case of AI applications for service robots and show cases people detection, pose detection, gaze detection and natural language processing all running simultaneously as containers on Jetson.
Please follow for instructions in the jetson-cloudnative-demo
repo on GitHub for running the voice container as part of the cloud native demo.
The jetson-voice
container includes various software packages with their respective licenses included within the container.
If you have any questions or need help, please visit the Jetson Developer Forums.