MaSS (Multilingual corpus of Sentence-aligned Spoken utterances) is an extension of the CMU Wilderness Multilingual Speech Dataset, a speech dataset based on recorded readings of the New Testament.
MaSS extends it by providing a large and clean dataset of 8,130 parallel spoken utterances across 8 languages (56 language pairs). The covered languages are: Basque, English, Finnish, French, Hungarian, Romanian, Russian and Spanish.
Source: MaSS: A Large and Clean Multilingual Corpus of Sentence-aligned Spoken Utterances Extracted from the BiblePaper | Code | Results | Date | Stars |
---|