NGC Catalog

Welcome Guest

For downloads and more information, please view on a desktop device.

Associated Products

Features

Description

A guide to quickly get started with Audio Effects Microservice

Publisher

NVIDIA

Latest Version

1.3.0

Modified

April 24, 2024

Compressed Size

33.54 MB

Quick Start Guide

In a typical Audio Effects Microservice deployment, a service provider configures and launches the Audio Effects Microservice on a GPU-based server. A client application (remote or local) connects to the microservice, negotiates the desired audio effects and connection parameters via gRPC, and starts streaming audio to the microservice (via RTP/UDP, gRPC/TCP or another protocol) and receives the processed (enhanced) audio back.

The Audio Effects Microservice includes Docker containers which can be used to demonstrate the typical deployment scenario described above. Specifically, this quick start guide provides step-by-step instructions for the following:

Configuring the Audio Effects Microservice
Launching the Audio Effects Microservice
Testing the microservice using the Audio Effects Sample client, which connects with the launched microservice, streams audio to the microservice and receives the processed audio back.

Prerequisites

Ensure the following prerequisites are available on your system for running the container:

Docker >= 19.03
nvidia-docker: latest nvidia-container-toolkit as described in the installation steps.
Nvidia driver: 535+ (for example, 535.161.07). Driver dependency is aligned to CUDA version requirement (>= 11.8)
Machine with GPU of any of below architectures:
- sm_90 (e.g. H100)
- sm_89 (e.g. L4, L40)
- sm_80 (e.g. A100)
- sm_86 (e.g. A10, A40)
- sm_75 (e.g. T4)
- sm_70 (e.g. V100)

Getting Started

Download the Maxine Audio Effects Quick Start scripts in the following way:

Select CLI Command to copy the download command.
Ensure you have installed the NGC CLI tool.
After you install the tool, to start the download, paste the copied command in a Command Prompt window.
Login to NGC docker registry.

  docker login nvcr.io

This will prompt for username and password. Use $oauthtoken for username and your NGC_API_KEY for password

Configuring Maxine Audio Effects Microservice

The file config.sh contains the parameters used for configuring various aspects of the Maxine Audio Effects Microservice. Before launching the microservice, ensure that these parameters are set properly. An example file config.sh is included in the package.

(Required) audio_effects_api_port: The port to be used for gRPC end-point.
(Optional) input_sample_rate: The input audio sample rate that will be supported by service. The default value is 16000. When use_studio_voice_quality is enabled: supported values are 16000 and 48000. When use_studio_voice_perf is enabled: supported value is 48000.
(Optional) output_sample_rate: The output audio sample rate that will be generated by service. The default value is 16000. Typically the output audio sample rate (output_sample_rate) matches the input audio sample rate (input_sample_rate), except when audio super-resolution effect is enabled for 8 kHz to 16 kHz or 16 kHz to 48 kHz upscaling.
(Optional) audio_chunk_duration: The audio chunk-duration in milliseconds that will be used to load the audio-effect model. The default value is 10.
(Optional) max_streams: Maximum number of streams to be supported by the service. It's used to define batch-size of audio-effect model. When use_studio_voice_quality/use_studio_voice_perf is enabled, this params should be equal to num_pipelines. Please also refer audio-effect-wise / gpu-wise "Maximum Batch Size" values mentioned at Audio Effects SDK documentation to check supported max value. optional. default: 10
(Optional) To enable or disable audio-effects in the service, set the corresponding parameter to true or false, respectively.

These flags are used to load the corresponding audio effect model. At least one audio effect must be enabled. The default value for all flags is false. If none of the flags are set to true, the service launch will fail.
- use_denoiser=true
- use_dereverb=true
- use_superres=true
- use_studio_voice_perf=true
- use_studio_voice_quality=true
Microservice also supports chaining of certain effects. The user needs to just set the effects and sample rates and the microservice will internally choose the appropriate chaining effect. Currently the following chaining effects are supported:
- Superres effect (8kHz to 16kHz) + Background Noise Removal effect (16kHz)
- Superres effect (8kHz to 16kHz) + Room Echo Removal effect (16kHz)
- Superres effect (8kHz to 16kHz) + Combined Background Noise Removal/Room Echo Removal effect (16kHz)
- Background Noise Removal effect (16kHz) + Superres effect (16kHz to 48kHz)
- Room Echo Removal effect (16kHz) + Superres effect (16kHz to 48kHz)
- Combined Background Noise Removal/Room Echo Removal effect (16kHz) + Superres effect (16kHz to 48kHz)
(Optional) intensity_ratio: Intensity of Effects to be applied. Intensity percent as a float values. when multiple supported effects are enabled, all effects will be applied with same provided intensity. This values applies to all client connections for a deployment. Applicable for Denoiser and Dereverb effects, it will be ignored if set for other effects.
(Optional) udp_port_range: Host port number(or range) to be mapped for udp streaming case. eg. "9001", "9001-9005". optional. required if service to be used for udp streaming input.
(Optional) grpc_worker_threads: Number of worker threads used by gRPC async server. For max-stream=200, <=2 threads are sufficient. Increasing number of threads as max-streams increases may give performance benefit. optional. default: 2
(Optional) max_udp_ports_per_stream: Maximum number of udp ports that need to be allocated per stream. This option is provided to allocate number UDP ports per session. It is set as 2 specifically when client is sending RTCP packets (though ignored on service) on odd port (e.g. 9001) next to RTP data packets specific port (e.g. 9000) optional. default: 2
(Optional) num_pipelines: Number of gstreamer pipelines to launch. max_streams are equally divided and placed into each of the pipeline. It is recommeded to use 250 streams for each pipeline. Also, note that the gpu memory usage will increase with increase in num_pipelines value. default: 1
my_pod_ip: Pod or Host ip where microservice is deployed. It is required for UDP data streaming support. Microservice includes this value as udp-host ip in response back to client.
enable_traces: Enable/Disable OpenTelemetry (https://opentelemetry.io/) traces.
enable_metrics: Enable/Disable OpenTelemetry metrics.
use_ostream_exporter: If enabled, OpenTelemetry traces and metrics will be printed on standard output. If disabled, OpenTelemetry will use otlp exporter (https://opentelemetry.io/docs/reference/specification/protocol/exporter/#configuration-options) and data is sent to lightstep.

Below lightstep related params are needed to be supplied if enable_traces=true && use_ostream_exporter=false

lightstep_token_filepath: Lightstep backend needs token for access. This parameter provides a local file path for token.
lightstep_cert_filepath: To access lightstep grpc endpoint, it needs SSL certificate. Use this parameter to provide file path.
lightstep_endpoint: URL to access lightstep backend.
(Optional) Logging: service prints logs to stdout or stderr. "docker logs " can be used to check service logs.
- (Optional) GST_DEBUG can be used to control gStreamer logs. service uses gStreamer pipeline for audio processing. For more info, Please refer gStreamer - Printing Debug Information. optional. default:0.
- (Optional) GLOG_logtostderr can be used to enable glogs. Please find more details related to glogs here. optional. default:0.
- (Optional) GLOG_v controls the log verbose level. High level means more detailed logs. optional. default:0.

Launching the Audio Effects Microservice

Set appropriate permissions for downloaded scripts, using command as follows
```
chmod -R 775 maxine_audio_quick_startv1.3.0
```
To initialize the required components, run the following command.
```
./audio_effects_init.sh
```
Start Audio Effects Microservice.
```
./audio_effects_start.sh
```

How to use custom pipline IO tuning configs:

update contents of respective tuning config io_tuning_configs/
for changes to take effect, restart container as follows:

$ ./audio_effects_stop.sh

$ ./audio_effects_start.sh

Testing the Microservice using Audio Effects Sample client Application

Audio Effects sample client is an application which can be run on any machine. This application is provided in the docker in binary form (executable) and required dependencies. The client application, once configured, connects to the microservice, negotiates a session, streams audio to and receives processed audio from the microservice.

An example of Audio Effects sample client configuration file is included in the package.

audio_effects_test_client_config.json

For more information about each of the configration fields in these files, please refer to the file audio_effects_test_client.proto.

The ms_config_file field in client config specifies path to the file which contains the config packet to be sent to the microservice. Refer to configs folder for more information. Refer to protos/audio_effects/<api-version>/audio_effects.proto for more details about the Microservice config request fields.

To test the microservice using this pre-built client,

Launch the client Docker with an interactive session.

chmod +x audio_effects_client_start.sh 
./audio_effects_client_start.sh

In the Docker shell, invoke the application as follows

/opt/nvidia/maxine-microservices/bin/mic_pipeline -audio_effects_test_client_config=/host/audio_effects_test_client_config.json 2>&1 | tee /host/audio_effects_client_logs

Upon launching the client,

The client invokes gRPC bi-directional endpoint of microservice deployed at the URI specified by audio_effects_uri field of the client configuration and establishes a gRPC channel.
The client sends a config packet as specified by ms_config_file field in client configuration and processes the response.
The client loads a wav file as specified by input_audio_file in client configuration and streams data packets using gRPC or RTP/UDP, as configuared in ms_config_file.
The client receives the processed audio from microservice and saves it to the output wav file specified by output_audio_file field in client configuration.

How to change denoiser version

set enable_denoiser_v2 feild in config.sh as required
- enable_denoiser_v2=false to enable denoiser version 1
- enable_denoiser_v2=true to enable denoiser version 2
Stop the Microservice if already running, refer to Stopping the Microservice
Restart the service, refer to Start Audio Effects Microservice

Audio Test Files

A few audio test files with artifacts are included in the package for testing purposes. These files can be found at the following paths in the package:

BNR+REC: sample_audio_files/bnr_rec/*.wav
REC: sample_audio_files/rec/*.wav
BNR: sample_audio_files/bnr/*.wav
Superres: sample_audio_files/superres/*.wav
BNR+REC(16k) + Superres(16k->48k): sample_audio_files/bnr_rec_superres_16k_48k/*.wav
BNR(16k) + Superres(16k->48k): sample_audio_files/bnr_superres_16k_48k/*.wav
REC(16k) + Superres(16k->48k): sample_audio_files/rec_superres_16k_48k/*.wav
Superres(8k->16k) + BNR+REC(16k): sample_audio_files/superres_8k_16k_bnr_rec/*.wav
Superres(8k->16k) + BNR(16k): sample_audio_files/superres_8k_16k_bnr/*.wav
Superres(8k->16k) + REC(16k): sample_audio_files/superres_8k_16k_rec/*.wav
studio voice quality: sample_inputs/studio_voice_quality/16k/.wav, sample_inputs/studio_voice_quality/48k/.wav
studio voice perf: sample_inputs/studio_voice_perf/48k/*wav

Stopping the Microservice

To shut down the service, run the following command.

chmod +x audio_effects_stop.sh
./audio_effects_stop.sh

Diagnostics

Checking the Logs

To check the logs, check container logs as follows on host machine:

docker logs -f maxine-audio-effects-service

Debugging

To debug or troubleshoot issues while you run the client command, run the mic_pipeline client sample app with GST_DEBUG=3

License

By pulling and using Maxine software, you accept the terms and conditions of the corresponding license (Under Resources).

Audio Effects Quick Start Guide