The state-of-the-art adaptive policies for Simultaneous Neural Machine Translation (SNMT) use monotonic attention to perform read/write decisions based on the partial source and target sequences.
Using our proposed method, we outperform the current state-of-the-art method by $2. 5$ Exact Match score on the Natural Question dataset while using only $25\%$ of parameters and $35\%$ of the latency during inference, and $4. 4$ Exact Match on WebQuestions dataset.
Hence, read/write decision policy remains the same across different input modalities, i. e., speech and text.
Simultaneous neural machine translation(SNMT) models start emitting the target sequence before they have processed the source sequence.
In general, the direct Speech-to-text translation (ST) is jointly trained with Automatic Speech Recognition (ASR), and Machine Translation (MT) tasks.
Ranked #1 on Speech-to-Text Translation on MuST-C EN->DE (using extra training data)
Inspired by these learning patterns in humans, we suggest a simple yet generic task aware framework to incorporate into existing joint learning strategies.
The current re-translation approaches are based on autoregressive sequence generation models (ReTA), which generate tar-get tokens in the (partial) translation sequentially.
In this paper, we describe end-to-end simultaneous speech-to-text and text-to-text translation systems submitted to IWSLT2020 online translation challenge.
In this paper, we describe the system submitted to the IWSLT 2020 Offline Speech Translation Task.
Ranked #3 on Speech-to-Text Translation on MuST-C EN->DE (using extra training data)