NGC Catalog

CLASSIC

Welcome Guest

For downloads and more information, please view on a desktop device.

Description

This contains source and dependencies tarballs of AI Reference Applications built for Holoscan For Media.

Publisher

NVIDIA

Latest Version

0.1.0

Modified

May 28, 2025

Compressed Size

3.69 GB

AI Reference Applications

This resource contains two AI applications designed to work with ST 2110 professional broadcast media streams.

AI Virtual Cameras: Person detection application that can create cropped virtual camera feeds from an ST 2110-20 video stream containing multiple people.
Riva ASR System: Automatic speech recognition system for audio transcription for an ST 2110-30 audio stream.

AI Virtual Cameras

This application creates virtual cameras by detecting and tracking persons in a video stream using a person detection model and GStreamer. It processes a high-resolution input stream and creates multiple cropped virtual camera outputs focused on detected persons.

Overview

The application consists of two main components:

cropper.py: Manages the GStreamer pipeline and video processing
detect_person.py: Handles interaction with the person detection model

Key Features

Real-time person detection
Dynamic virtual camera creation based on detected persons
Smooth tracking and coordinate updates
Support for multiple virtual camera outputs
GStreamer-based video processing pipeline

Architecture

      Input Stream (3840x2160)
               ↓
       GStreamer Pipeline
        ↓             ↓
    ┌─────────┐   ┌─────────┐
    │ Person  │   │ Virtual │
    │Detection│ → │ Cameras │             
    │  Model  │   │(cropper)│
    └─────────┘   └─────────┘             
                      ↓
         Multiple Output Streams (1920x1080)

GStreamer Pipeline

The application uses a GStreamer pipeline with the following key components:

Input Source:
- ST 2110-20 video source (nvdsudpsrc)
- YCbCr-4:2:2 10-bit format (UYVP)
- Up to 3840x2160 resolution
- Progressive scan
- Integer or fractional frame rate, e.g. 29.97fps
Processing Pipeline:
- Video conversion
- Person detection
- Dynamic cropping based on detected persons
- Scaling up to 1920x1080
- Up to 3 virtual camera output streams
Output Streams:
- Individual ST 2110-20 video sink (nvdsudpsink) for each virtual camera
- YCbCr-4:2:2 10-bit format (UYVP)
- Up to 1920x1080 resolution
- Progressive scan
- No frame rate conversion
NMOS Node:
- NMOS bin element (nvdsnmosbin) for IS-04 Discovery & Registration and IS-05 Connection Management of the input and output streams

Configuration

The application can be configured using environment variables:

FRAMERATE=30000/1001    # Input framerate, e.g. 25 or 30000/1001
INPUT_WIDTH=3840        # Input stream width
INPUT_HEIGHT=2160       # Input stream height
CROP_WIDTH=1280         # YOLO crop width
CROP_HEIGHT=720         # YOLO crop height
OUTPUT_WIDTH=1920       # Virtual camera output width
OUTPUT_HEIGHT=1080      # Virtual camera output height
NUM_CAMERAS=3           # Number of virtual cameras to create
HOSTNAME=ai-virtual-cameras.local  # Hostname for NMOS
HTTP_PORT=5231          # HTTP port for NMOS
MODEL_REPO=<model-repo> # GitHub repo from which to load via torch.hub.load
MODEL_NAME=<model-name> # Model entrypoint

Usage

Start the application:
```
python3.10 cropper.py
```
The application will:
- Initialize the GStreamer pipeline
- Load the person detection model
- Create the NMOS Node with a Receiver and Senders
- Start processing the input stream when the Receiver is connected
- Stream the virtual camera outputs when the Senders are enabled

Person Detection and Tracking

The application has the following features:

Real-time detection of persons in the input stream
Centered cropping around detected persons
Multiple person tracking support

Running the Application in a Cluster

A Dockerfile is provided for containerizing the application. Alternatively, you can install and execute the application manually by following these steps.

Prerequisites

Access to a Holoscan for Media cluster or a Local Kubernetes Setup

Setup Steps

Start Media Gateway Pod
- Start a Media Gateway pod with the description set as nvdsnmosbin in the Helm values.
```
description: nvdsnmosbin
```
Set up VS Code Server
- Follow the Local Developer Setup with Kubernetes: Accessing Cluster with VS Code guide to connect to the Kubernetes cluster
- Attach to the newly created pod using the Attach Visual Studio Code command. This opens a new VS Code window where the contents of the pod can be browsed. This provides a development environment in the cluster that is inside the pod to easily experiment with GStreamer, DeepStream, Rivermax, NMOS, etc.
- Copy the tarball inside the pod and extract it

Install Dependencies and Run

# Install required Python packages
pip install -r requirements.txt

# From entrypoint.sh
export NVDSNMOS_ORIGINAL_CORE_SET=$(/workspace/coreaffinity.py)
export LC_ALL=C
export NVDS_NMOS_DEFAULT_SEED="aivc" # A unique string to override the default NMOS seed in the container

# Export environment variables as mentioned above in the Configuration section

# Start the application
python3.10 cropper.py

Riva ASR System

This system provides real-time Automatic Speech Recognition (ASR) capabilities by connecting an ST 2110-30 audio source to the NVIDIA Riva ASR NIM (NVIDIA Inference Microservice), deployed locally. It uses GStreamer to capture audio from a multicast stream, processes it through the Riva ASR service, and provides real-time transcriptions via a web user interface.

Deploying the NIM

The NIM can be deployed simply by using the YAML at NIM Deployment/config.yaml. An NGC API key needs to be provided to download the models.

Project Description

The system consists of a Python client application that:

Connects to an ST 2110-30 multicast audio stream using GStreamer
Processes the audio through NVIDIA's Riva ASR service
Provides real-time transcriptions via a web interface using Server-Sent Events (SSE)

The client handles:

Audio stream capture and preprocessing
Transcript generation
Web-based transcription streaming
Timestamp synchronization

Default Audio Pipeline Configuration

The system uses a GStreamer pipeline with the following default configuration:

Input:
- ST 2110-30 multicast audio stream (48kHz, 16 channels, 24-bit, i.e. S24BE) with 0.125 ms packet time
- The default pipeline expects the stream at a multicast IP address configurable through the SOURCE_MC_IP environment variable and port 5004
Processing:
- Channel conversion to mono
- Sample rate conversion to 16kHz
- Format conversion to S16LE
Output: Processed audio stream for ASR

Environment Variables Configuration

The following environment variables need to be configured:

Riva Service Configuration

# Riva server connection details
RIVA_SERVER_GRPC_IP_PORT="192.168.23.152:32322" # gRPC IP & port of the Riva NIM server deployed <nodeip>:<nodeport> with the provided config.yaml

Audio Source Configuration

# ST 2110-30 multicast audio stream configuration
SOURCE_MC_IP="232.220.109.111"   # Multicast IP for audio source that is to be consumed via the GStreamer pipeline
LOCAL_IFACE_IP="192.168.20.79"   # Local interface IP of the pod where application is running

Usage

Configure the environment variables
Start the client application:
```
python riva-static-client.py
```
Access the transcription stream at:
```
http://<server_ip>:5000/stream
```

Riva Frontend

The Riva frontend is a Create React App application that provides a simple web UI to visualize the transcriptions generated by the Riva client.

Frontend Configuration

The frontend uses environment variables for configuration. To set up:

Modify the .env file in the frontend directory with appropriate Riva Client IP for consuming transcripts.
Build and run the frontend locally:
```
cd frontend
npm install
npm run build
```

Build and run using Docker:

# Build with custom backend URL
docker build --build-arg REACT_APP_BACKEND_URL=http://your-backend-host:5000 -t riva-frontend .

# Run the container
docker run -p 3000:3000 riva-frontend

Running the Application in a Cluster

A Dockerfile is provided for containerizing the application. Alternatively, you can install and execute the application manually by following these steps.

Prerequisites

Access to a Holoscan for Media cluster or a Local Kubernetes Setup

Setup Steps

Deploying the NIM

The NIM can be deployed with just oc apply -f config.yaml or kubectl apply -f config.yaml

Deploying the Client

Start Media Gateway Pod
- Start a Media Gateway pod with the description set as nvdsnmosbin in the Helm values.
```
description: nvdsnmosbin
```
Set up VS Code Server
- Follow the Local Developer Setup with Kubernetes: Accessing Cluster with VS Code guide to connect to the Kubernetes cluster
- Attach to the newly created pod using the Attach Visual Studio Code command. This opens a new VS Code window where the contents of the pod can be browsed. This provides a development environment in the cluster that is inside the pod to easily experiment with GStreamer, DeepStream, Rivermax, NMOS, etc.
- Copy the tarball inside the pod and extract it

Install Dependencies and Run

# Install required Python packages
pip install -r requirements.txt --ignore-installed

# export environment variables as mentioned above in the Environment Variables Configuration section

# Start the Riva client application
python3.10 riva-static-client.py

Deploying the React Frontend

The React app can be containerised with the steps above and then deployed on the cluster without any additional configuration.

License

By downloading and using this software, you accept the terms and conditions of the NVIDIA AI Product Agreement.

The source code for the AI Reference Applications themselves is licensed under the Apache License, Version 2.0. The notices, attribution, licenses and source code are included in the tarball.

Holoscan for Media AI Reference Applications

AI Reference Applications

AI Virtual Cameras

Overview

Key Features

Architecture

GStreamer Pipeline

Configuration

Usage

Person Detection and Tracking

Running the Application in a Cluster

Prerequisites

Setup Steps

Riva ASR System

Deploying the NIM

Project Description

Default Audio Pipeline Configuration

Environment Variables Configuration

Riva Service Configuration

Audio Source Configuration

Usage

Riva Frontend

Frontend Configuration

Running the Application in a Cluster

Prerequisites

Setup Steps

Deploying the NIM

Deploying the Client

Deploying the React Frontend

License