Document-level machine translation conditions on surrounding sentences to produce coherent translations.
Pre-training models on vast quantities of unlabeled data has emerged as an effective approach to improving accuracy on many NLP tasks.
Ranked #1 on Machine Translation on WMT2016 Romanian-English (using extra training data)
4 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.
no code implementations • 16 May 2020 • Kritika Singh, Vimal Manohar, Alex Xiao, Sergey Edunov, Ross Girshick, Vitaliy Liptchinsky, Christian Fuegen, Yatharth Saraf, Geoffrey Zweig, Abdel-rahman Mohamed
Many semi- and weakly-supervised approaches have been investigated for overcoming the labeling cost of building high quality speech recognition systems.
Open-domain question answering relies on efficient passage retrieval to select candidate contexts, where traditional sparse vector space models, such as TF-IDF or BM25, are the de facto method.
Ranked #6 on Passage Retrieval on Natural Questions
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks.
To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.
no code implementations • 27 Oct 2019 • Kritika Singh, Dmytro Okhonko, Jun Liu, Yongqiang Wang, Frank Zhang, Ross Girshick, Sergey Edunov, Fuchun Peng, Yatharth Saraf, Geoffrey Zweig, Abdelrahman Mohamed
Supervised ASR models have reached unprecedented levels of accuracy, thanks in part to ever-increasing amounts of labelled training data.
Back-translation is a widely used data augmentation technique which leverages target monolingual data.
This paper describes Facebook FAIR's submission to the WMT19 shared news translation task.
Ranked #1 on Machine Translation on WMT2019 English-German
The lottery ticket hypothesis proposes that over-parameterization of deep neural networks (DNNs) aids training by increasing the probability of a "lucky" sub-network initialization being present rather than by helping the optimization process (Frankle & Carbin, 2019).
fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.
Pre-trained language model representations have been successful in a wide range of language understanding tasks.
We present a new approach for pretraining a bi-directional transformer model that provides significant performance gains across a variety of language understanding problems.
Ranked #8 on Constituency Parsing on Penn Treebank
An effective method to improve neural machine translation with monolingual data is to augment the parallel training corpus with back-translations of target language sentences.
Ranked #2 on Machine Translation on WMT2014 English-German (using extra training data)
Sequence to sequence learning models still require several days to reach state of the art performance on large benchmark datasets using a single machine.
Ranked #12 on Machine Translation on WMT2014 English-French
There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam.
Ranked #4 on Machine Translation on IWSLT2015 German-English