There were a total of 10 participating teams for the tasks, with a total of 151 intermediate model submissions and 13 final models.
Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus.
One of the biggest challenges hindering progress in low-resource and multilingual machine translation is the lack of good evaluation benchmarks.
Fact checking at scale is difficult -- while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem.
4 code implementations • 21 Oct 2020 • Angela Fan, Shruti Bhosale, Holger Schwenk, Zhiyi Ma, Ahmed El-Kishky, Siddharth Goyal, Mandeep Baines, Onur Celebi, Guillaume Wenzek, Vishrav Chaudhary, Naman Goyal, Tom Birch, Vitaliy Liptchinsky, Sergey Edunov, Edouard Grave, Michael Auli, Armand Joulin
Existing work in translation demonstrated the potential of massively multilingual machine translation by training a single model able to translate between any pair of languages.
To evaluate the quality of the mined bitexts, we train NMT systems for most of the language pairs and evaluate them on TED, WMT and WAT test sets.
We also present a detailed empirical analysis of the key factors that are required to achieve these gains, including the trade-offs between (1) positive transfer and capacity dilution and (2) the performance of high and low resource languages at scale.
Pre-training text representations have led to significant improvements in many areas of natural language processing.
This paper describes Facebook AI's submission to the WAT 2019 Myanmar-English translation task.
We introduce Trans-gram, a simple and computationally-efficient method to simultaneously learn and align wordembeddings for a variety of languages, using only monolingual data and a smaller set of sentence-aligned data.