# Model Overview

ASR-based text/audio aligner based on CTC-loss algorithm that was used to train TalkNet.

# Usage

## Automatically load the model from NGC

```python3
from nemo.collections.asr.models import EncDecCTCModel
model = EncDecCTCModel.from_pretrained("asr_talknet_aligner")
```

For an example, on how to use this model to generate speech, refer to the TTS inference notebook.

# Training Set

This model is trained on LibriTTS sampled at 22050Hz with input text converted to phonemes, and can be used to extract durations for audio excerpt and corresponding phonemes sequence.

# References

[2] [TalkNet 2 Paper](https://arxiv.org/abs/2104.08189)

asr_talknet_aligner

Text/Audio aligner based on QuartzNet 5x5 for durations extraction

ASR TalkNet Aligner

Model Overview

Usage

Automatically load the model from NGC

Training Set

References