Speech to text or Automatic Speech Recognition (ASR) refers to automatically transcribing a spoken language. This capability is especially useful in applications like smart virtual assistants, converting audio to subtitles on a video online, transcribing customer interactions into text for archiving at a call center and many more. NVIDIA NeMo is an open source toolkit that can be used for training or fine-tuning AI models for ASR applications. It is built for data scientists and researchers to build new state of the art speech and NLP networks easily through API compatible building blocks that can be connected together.
This notebook contains a basic tutorial of ASR concepts, introduced with code snippets using the NeMo framework. In addition, the notebook showcases seamless integration with the powerful Weights & Biases MLOps platform for data & model architectures exploration, debugging, hyper-parameter tuning, and reports sharing through collaborative dashboards.