Multilingual LibriSpeech (MLS)

Introduced by Pratap et al. in MLS: A Large-Scale Multilingual Dataset for Speech Research

Multilingual LibriSpeech is a large multilingual corpus suitable for speech research. The dataset is derived from read audiobooks from LibriVox and consists of 8 languages - English, German, Dutch, Spanish, French, Italian, Portuguese, Polish. It includes about 44.5K hours of English and a total of about 6K hours for other languages.

Source: MLS: A Large-Scale Multilingual Dataset for Speech Research

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • CC BY 4.0

Modalities


Languages