NGC | Catalog Speech Dataset Collection Speech Dataset Collection

For contents of this collection and more information, please view on a desktop device.
Logo for Speech Dataset Collection


Data is the new code. is the one-stop-shop for training data for machine learning.



April 4, 2023
Helm Charts

Speed your time to market with ready-to-use data you can trust.

Available in multiple domains, languages, accents, and recording types with rich metadata, our datasets are transparent and now just a click away.

Within this collection of speech datasets, the following languages are available:

  • Dutch (NL)
  • English (AU)
  • English (UK)
  • French (CA)
  • French (FR)
  • German (DE)
  • Italian (IT)
  • Japanese (JP)
  • Portuguese (BR)
  • Portuguese (PT)
  • Spanish (ES)
  • Spanish (MX)

With, training conversational AI has never been easier.

Visit the marketplace for additional 1 hour samples exclusive to NVIDIA NGC catalog users, more languages and data types such as NLP and Machine Translation!

By downloading and using this dataset, you accept the terms and conditions of the license.