MuST-C

Introduced by Gangi et al. in MuST-C: a Multilingual Speech Translation Corpus

MuST-C currently represents the largest publicly available multilingual corpus (one-to-many) for speech translation. It covers eight language directions, from English to German, Spanish, French, Italian, Dutch, Portuguese, Romanian and Russian. The corpus consists of audio, transcriptions and translations of English TED talks, and it comes with a predefined training, validation and test split.

Source: One-to-Many Multilingual End-to-End Speech Translation

Homepage