NGC | Catalog
CatalogResourcesSpeech to Text with NVIDIA NeMo and Weights & Biases

Speech to Text with NVIDIA NeMo and Weights & Biases

Logo for Speech to Text with NVIDIA NeMo and Weights & Biases
Description
This notebook walks through fine-tuning speech recognition model using NVIDIA NeMo framework in integration with Weights and Biases for experiment tracking
Publisher
NVIDIA
Latest Version
1
Modified
April 4, 2023
Compressed Size
498.08 KB

Description

Speech to text or Automatic Speech Recognition (ASR) refers to automatically transcribing a spoken language. This capability is especially useful in applications like smart virtual assistants, converting audio to subtitles on a video online, transcribing customer interactions into text for archiving at a call center and many more. NVIDIA NeMo is an open source toolkit that can be used for training or fine-tuning AI models for ASR applications. It is built for data scientists and researchers to build new state of the art speech and NLP networks easily through API compatible building blocks that can be connected together.

This notebook contains a basic tutorial of ASR concepts, introduced with code snippets using the NeMo framework. In addition, the notebook showcases seamless integration with the powerful Weights & Biases MLOps platform for data & model architectures exploration, debugging, hyper-parameter tuning, and reports sharing through collaborative dashboards.