Get ready-to-use bilingual Chinese and English speech dataset.
This dataset is used for training bilingual (Chinese and English) Text-to-Speech models, including training FastPitch acoustic model with NVIDIA Deep Learning Examples FastPitch training recipe. The dataset contains about 2,740 bilingual audio samples of a single female speaker and their corresponding text transcripts, each of them is an audio of around 5-6 seconds and have a total length of approximately 4.5 hours.
The dataset is provided and shared by Chunghwa Telecom Laboratories. By downloading and using this dataset, you accept the terms and conditions of the license, CC BY-NC 4.0.