Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance.
We then improve an XLM-based unsupervised neural MT system pre-trained on Wikipedia by supplementing it with pseudo-parallel text mined from the same corpus, boosting unsupervised translation performance by up to 3. 5 BLEU on the WMT'14 French-English and WMT'16 German-English tasks and outperforming the previous state-of-the-art.
Multilingual contextual embeddings have demonstrated state-of-the-art performance in zero-shot cross-lingual transfer learning, where multilingual BERT is fine-tuned on one source language and evaluated on a different target language.
We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances.
We propose a novel approach to semi-supervised automatic speech recognition (ASR).
Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one.
We evaluate three simple, normalization-centric changes to improve Transformer training.
Ranked #4 on Machine Translation on IWSLT2015 English-Vietnamese
We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.
The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition.