NGC Catalog

CLASSIC

Welcome Guest

For copy image paths and more information, please view on a desktop device.

Description

Autovox, by Cogknit Semantics, is a voice OS platform, providing multiple language models: ASR, MT and TTS. This limited access ASR container converts Hindi Audio to Hindi Text.

Publisher

Cogknit Semantics

Latest Tag

1.0.3

Modified

October 18, 2021

Compressed Size

5.91 GB

Multinode Support

Multi-Arch Support

Running Autovox Hindi ASR

Requirements

Python 3.7 and above.
GPU supported environment.
Nvidia Container Toolkit.

Please setup the Nvidia Container Toolkit before proceeding to pull the docker image.

Pull Command

docker pull nvcr.io/partners/cogknit/autovox-hindi-asr:1.0.3

Before running the container, ensure you have pulled the docker to get an up-to-date image. Once the pull is complete, you can run the container image and obtain an Autovox Trial Edition License Key.

Obtain a trial license

The Autovox Nemo container provides a 14 day trial in partnership with Nvidia and is able to process up to 4 hours of audio or a maximum of 40 runs of the provided image. Please follow the instructions below for obtaining your trial license key. You may reach out to the Autovox team for a full unrestricted license for your organization.

docker run -it -e license="generate" nvcr.io/partners/cogknit/autovox-hindi-asr:1.0.3

Provide your details (such as name & email) on running the container. Please make sure you use your official email address for this purpose.A trial license key will be sent to the provided mail ID.

Run the container

Run the container image to use the model as shown below

docker run --gpus all -v :/home/ubuntu/wav/ -e license="" -e input="" nvcr.io/partners/cogknit/autovox-hindi-asr:1.0.3

Sample Command

docker run --gpus all -v /home/wav/:/home/ubuntu/wav/ -e license="xxxxxxxxx" -e input="sample.wav" nvcr.io/partners/cogknit/autovox-hindi-asr:1.0.3

-- -v refers to the volume that mounted on the docker from local storage. This will have both the input and output files. Files that are intended to be transcribed using the model should reside in this volume. The output text file will also be generated in the same folder. The trial version only provides for a simple output in text file format. Contact the Autovox team for further features such as time stamped transcripts, speaker diarization/ identification and domain specific models.

-- license refers to enter the trial license key received on the registered email

-- input refers to the audio file ( in .wav format and of 8kHz sample rate) to be transcribed

Upto 1.2 hours of sample audio is accessible here. You may use the provided sample audio to test the model or use your own audio files.** Please note that if you are trying transcription with your own audio files, the container only accepts audio files in the WAV format with a sample rate of 8Khz and the files would need to be pre-converted to this format. For other file types and formats, please contact the Autovox team for a full license.

Sample output

This container has been tested on the below configurations, though not limited to the same.

Software Configuration	GPU
Ubuntu 20.04 LTS	Nvidia Tesla T4
Ubuntu 20.04 LTS	Nvidia A100

License

The Autovox NeMo Hindi model is proprietary software developed and owned exclusively by Cogknit Semantics Pvt Ltd. The container provided and its contents must not be modified or redistributed without approval.

By pulling and using the container, you accept the terms and conditions of this End User License Agreement.

Please email or visit autovox.ai for extending the license and for general queries.