Pretrained models in acoustic and textual modalities can potentially improve speech translation for both Cascade and End-to-end approaches.
We present our development of the multilingual machine translation system for the large-scale multilingual machine translation task at WMT 2021.
In this paper, we describe our submission to the Simultaneous Speech Translation at IWSLT 2022.
We investigate whether these ideas can be applied to speech translation, by building ST models trained on speech transcription and text translation data.
Although AL is shown to be helpful with large budgets, it is not enough to build high-quality translation systems in these low-resource conditions.
Speech-to-speech translation (S2ST) converts input speech to speech in another language.
We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations.
The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.
The experiments show that with limited data far less than needed for training a model from scratch, we can adapt a Transformer-based ASR model to incorporate both transcription and compression capabilities.
On How2 English-Portuguese speech translation, we reduce latency to 0. 7 second (-84% rel.)