GigaST is a large-scale pseudo speech translation (ST) corpus. The corpus was created by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translation system and the test set was translated by human. ST models trained with an addition of the corpus obtain new state-of-the-art results on the MuST-C English-German benchmark test set.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages