GigaST is a large-scale pseudo speech translation (ST) corpus. The corpus was created by translating the text in GigaSpeech, an English ASR corpus, into German and Chinese. The training set is translated by a strong machine translation system and the test set was translated by human. ST models trained with an addition of the corpus obtain new state-of-the-art results on the MuST-C English-German benchmark test set.
Paper | Code | Results | Date | Stars |
---|