NGC | Catalog
CatalogResourcesSpeech to Text with NVIDIA NeMo and Weights & Biases

Speech to Text with NVIDIA NeMo and Weights & Biases

For downloads and more information, please view on a desktop device.
Logo for Speech to Text with NVIDIA NeMo and Weights & Biases


This notebook walks through fine-tuning speech recognition model using NVIDIA NeMo framework in integration with Weights and Biases for experiment tracking



Latest Version



April 4, 2023

Compressed Size

498.08 KB


Speech to text or Automatic Speech Recognition (ASR) refers to automatically transcribing a spoken language. This capability is especially useful in applications like smart virtual assistants, converting audio to subtitles on a video online, transcribing customer interactions into text for archiving at a call center and many more. NVIDIA NeMo is an open source toolkit that can be used for training or fine-tuning AI models for ASR applications. It is built for data scientists and researchers to build new state of the art speech and NLP networks easily through API compatible building blocks that can be connected together.

This notebook contains a basic tutorial of ASR concepts, introduced with code snippets using the NeMo framework. In addition, the notebook showcases seamless integration with the powerful Weights & Biases MLOps platform for data & model architectures exploration, debugging, hyper-parameter tuning, and reports sharing through collaborative dashboards.