NGC Catalog
CLASSIC
Welcome Guest
Resources
Holoscan for Media AI Reference Applications

Holoscan for Media AI Reference Applications

For downloads and more information, please view on a desktop device.
Logo for Holoscan for Media AI Reference Applications
Description
This contains source and dependencies tarballs of AI Reference Applications built for Holoscan For Media.
Publisher
NVIDIA
Latest Version
0.1.0
Modified
May 28, 2025
Compressed Size
3.69 GB

AI Reference Applications

This resource contains two AI applications designed to work with ST 2110 professional broadcast media streams.

  1. AI Virtual Cameras: Person detection application that can create cropped virtual camera feeds from an ST 2110-20 video stream containing multiple people.
  2. Riva ASR System: Automatic speech recognition system for audio transcription for an ST 2110-30 audio stream.

AI Virtual Cameras

This application creates virtual cameras by detecting and tracking persons in a video stream using a person detection model and GStreamer. It processes a high-resolution input stream and creates multiple cropped virtual camera outputs focused on detected persons.

Overview

The application consists of two main components:

  1. cropper.py: Manages the GStreamer pipeline and video processing
  2. detect_person.py: Handles interaction with the person detection model

Key Features

  • Real-time person detection
  • Dynamic virtual camera creation based on detected persons
  • Smooth tracking and coordinate updates
  • Support for multiple virtual camera outputs
  • GStreamer-based video processing pipeline

Architecture

      Input Stream (3840x2160)
               ↓
       GStreamer Pipeline
        ↓             ↓
    ┌─────────┐   ┌─────────┐
    │ Person  │   │ Virtual │
    │Detection│ → │ Cameras │             
    │  Model  │   │(cropper)│
    └─────────┘   └─────────┘             
                      ↓
         Multiple Output Streams (1920x1080)    

GStreamer Pipeline

The application uses a GStreamer pipeline with the following key components:

  1. Input Source:

    • ST 2110-20 video source (nvdsudpsrc)
    • YCbCr-4:2:2 10-bit format (UYVP)
    • Up to 3840x2160 resolution
    • Progressive scan
    • Integer or fractional frame rate, e.g. 29.97fps
  2. Processing Pipeline:

    • Video conversion
    • Person detection
    • Dynamic cropping based on detected persons
    • Scaling up to 1920x1080
    • Up to 3 virtual camera output streams
  3. Output Streams:

    • Individual ST 2110-20 video sink (nvdsudpsink) for each virtual camera
    • YCbCr-4:2:2 10-bit format (UYVP)
    • Up to 1920x1080 resolution
    • Progressive scan
    • No frame rate conversion
  4. NMOS Node:

    • NMOS bin element (nvdsnmosbin) for IS-04 Discovery & Registration and IS-05 Connection Management of the input and output streams

Configuration

The application can be configured using environment variables:

FRAMERATE=30000/1001    # Input framerate, e.g. 25 or 30000/1001
INPUT_WIDTH=3840        # Input stream width
INPUT_HEIGHT=2160       # Input stream height
CROP_WIDTH=1280         # YOLO crop width
CROP_HEIGHT=720         # YOLO crop height
OUTPUT_WIDTH=1920       # Virtual camera output width
OUTPUT_HEIGHT=1080      # Virtual camera output height
NUM_CAMERAS=3           # Number of virtual cameras to create
HOSTNAME=ai-virtual-cameras.local  # Hostname for NMOS
HTTP_PORT=5231          # HTTP port for NMOS
MODEL_REPO=<model-repo> # GitHub repo from which to load via torch.hub.load
MODEL_NAME=<model-name> # Model entrypoint

Usage

  1. Start the application:

    python3.10 cropper.py
    
  2. The application will:

    • Initialize the GStreamer pipeline
    • Load the person detection model
    • Create the NMOS Node with a Receiver and Senders
    • Start processing the input stream when the Receiver is connected
    • Stream the virtual camera outputs when the Senders are enabled

Person Detection and Tracking

The application has the following features:

  • Real-time detection of persons in the input stream
  • Centered cropping around detected persons
  • Multiple person tracking support

Running the Application in a Cluster

A Dockerfile is provided for containerizing the application. Alternatively, you can install and execute the application manually by following these steps.

Prerequisites

  1. Access to a Holoscan for Media cluster or a Local Kubernetes Setup

Setup Steps

  1. Start Media Gateway Pod

    • Start a Media Gateway pod with the description set as nvdsnmosbin in the Helm values.
      description: nvdsnmosbin
      
  2. Set up VS Code Server

    • Follow the Local Developer Setup with Kubernetes: Accessing Cluster with VS Code guide to connect to the Kubernetes cluster
    • Attach to the newly created pod using the Attach Visual Studio Code command. This opens a new VS Code window where the contents of the pod can be browsed. This provides a development environment in the cluster that is inside the pod to easily experiment with GStreamer, DeepStream, Rivermax, NMOS, etc.
    • Copy the tarball inside the pod and extract it
  3. Install Dependencies and Run

    # Install required Python packages
    pip install -r requirements.txt
    
    # From entrypoint.sh
    export NVDSNMOS_ORIGINAL_CORE_SET=$(/workspace/coreaffinity.py)
    export LC_ALL=C
    export NVDS_NMOS_DEFAULT_SEED="aivc" # A unique string to override the default NMOS seed in the container
    
    # Export environment variables as mentioned above in the Configuration section
    
    # Start the application
    python3.10 cropper.py
    

Riva ASR System

This system provides real-time Automatic Speech Recognition (ASR) capabilities by connecting an ST 2110-30 audio source to the NVIDIA Riva ASR NIM (NVIDIA Inference Microservice), deployed locally. It uses GStreamer to capture audio from a multicast stream, processes it through the Riva ASR service, and provides real-time transcriptions via a web user interface.

Deploying the NIM

The NIM can be deployed simply by using the YAML at NIM Deployment/config.yaml. An NGC API key needs to be provided to download the models.

Project Description

The system consists of a Python client application that:

  1. Connects to an ST 2110-30 multicast audio stream using GStreamer
  2. Processes the audio through NVIDIA's Riva ASR service
  3. Provides real-time transcriptions via a web interface using Server-Sent Events (SSE)

The client handles:

  • Audio stream capture and preprocessing
  • Transcript generation
  • Web-based transcription streaming
  • Timestamp synchronization

Default Audio Pipeline Configuration

The system uses a GStreamer pipeline with the following default configuration:

  • Input:
    • ST 2110-30 multicast audio stream (48kHz, 16 channels, 24-bit, i.e. S24BE) with 0.125 ms packet time
    • The default pipeline expects the stream at a multicast IP address configurable through the SOURCE_MC_IP environment variable and port 5004
  • Processing:
    • Channel conversion to mono
    • Sample rate conversion to 16kHz
    • Format conversion to S16LE
  • Output: Processed audio stream for ASR

Environment Variables Configuration

The following environment variables need to be configured:

Riva Service Configuration

# Riva server connection details
RIVA_SERVER_GRPC_IP_PORT="192.168.23.152:32322" # gRPC IP & port of the Riva NIM server deployed <nodeip>:<nodeport> with the provided config.yaml

Audio Source Configuration

# ST 2110-30 multicast audio stream configuration
SOURCE_MC_IP="232.220.109.111"   # Multicast IP for audio source that is to be consumed via the GStreamer pipeline
LOCAL_IFACE_IP="192.168.20.79"   # Local interface IP of the pod where application is running

Usage

  1. Configure the environment variables
  2. Start the client application:
    python riva-static-client.py
    
  3. Access the transcription stream at:
    http://<server_ip>:5000/stream
    

Riva Frontend

The Riva frontend is a Create React App application that provides a simple web UI to visualize the transcriptions generated by the Riva client.

Frontend Configuration

The frontend uses environment variables for configuration. To set up:

  1. Modify the .env file in the frontend directory with appropriate Riva Client IP for consuming transcripts.

  2. Build and run the frontend locally:

    cd frontend
    npm install
    npm run build
    
  3. Build and run using Docker:

    # Build with custom backend URL
    docker build --build-arg REACT_APP_BACKEND_URL=http://your-backend-host:5000 -t riva-frontend .
    
    # Run the container
    docker run -p 3000:3000 riva-frontend
    

Running the Application in a Cluster

A Dockerfile is provided for containerizing the application. Alternatively, you can install and execute the application manually by following these steps.

Prerequisites

  1. Access to a Holoscan for Media cluster or a Local Kubernetes Setup

Setup Steps

Deploying the NIM

  1. The NIM can be deployed with just oc apply -f config.yaml or kubectl apply -f config.yaml

Deploying the Client

  1. Start Media Gateway Pod

    • Start a Media Gateway pod with the description set as nvdsnmosbin in the Helm values.
      description: nvdsnmosbin
      
  2. Set up VS Code Server

    • Follow the Local Developer Setup with Kubernetes: Accessing Cluster with VS Code guide to connect to the Kubernetes cluster
    • Attach to the newly created pod using the Attach Visual Studio Code command. This opens a new VS Code window where the contents of the pod can be browsed. This provides a development environment in the cluster that is inside the pod to easily experiment with GStreamer, DeepStream, Rivermax, NMOS, etc.
    • Copy the tarball inside the pod and extract it
  3. Install Dependencies and Run

    # Install required Python packages
    pip install -r requirements.txt --ignore-installed
    
    # export environment variables as mentioned above in the Environment Variables Configuration section
    
    # Start the Riva client application
    python3.10 riva-static-client.py
    

Deploying the React Frontend

The React app can be containerised with the steps above and then deployed on the cluster without any additional configuration.

License

By downloading and using this software, you accept the terms and conditions of the NVIDIA AI Product Agreement.

The source code for the AI Reference Applications themselves is licensed under the Apache License, Version 2.0. The notices, attribution, licenses and source code are included in the tarball.