Model
Spanish US FastPitch model
Sign in to access this content
| Field | Response |
|---|---|
| Intended Application & Domain: | Speech Synthesis |
| Model Task | Speech Synthesis and Generative Adversarial Network |
| Intended Users | This model is intended for developers building interactive call centers, virtual assistants, language learning assistants to improve pronunciation, automatically generate voice-overs, narrate or comment on videos, and provide audio alternatives for visually impaired users or people with light sensitivity. |
| Model Output | Audio files (.wav) |
| How the model works | Model transcribes input text characters into audio representation. |
| Technical Limitations | Model only has the capacity to produce a voice in the language, dialect and gender(s) in which it is trained. This model makes no effort to moderate or modify input text. |
| Performance Metrics | % preference when compared with available alternatives Pitch (mean) Pitch_standard deviation (std) (mean) Pitch_kurtosis (mean) Pitch_skew (mean) Fundamental Frequency Ratio (f0_ratio) (mean) f0_ratio_std (mean) f0_ratio_kurtosis (mean) f0_ratio_skew (mean) Pitch (median) Pitch_std (median) Pitch_kurtosis (median) Pitch_skew (median) f0_ratio (median) f0_ratio_std (median) f0_ratio_kurtosis (median) f0_ratio_skew (median)" |
| Potential Known Risks | May unnaturally synthesize vocabulary not included in the pronunciation dictionary or omit phonetic symbols not used in training. |
| Licensing: | https://docs.nvidia.com/ai-foundation-models-community-license.pdf |