1 code implementation • ICML 2020 • Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens.
Ranked #54 on Machine Translation on WMT2014 English-German
1 code implementation • WMT (EMNLP) 2021 • Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan
We describe Facebook’s multilingual model submission to the WMT2021 shared task on news translation.
no code implementations • 7 Feb 2023 • Simeng Sun, Maha Elbayad, Anna Sun, James Cross
With multilingual machine translation (MMT) models continuing to grow in size and number of supported languages, it is natural to reuse and upgrade existing models to save computation as data becomes available in more languages.
6 code implementations • Meta AI 2022 • NLLB team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe, Shannon Spruit, Chau Tran, Pierre Andrews, Necip Fazil Ayan, Shruti Bhosale, Sergey Edunov, Angela Fan, Cynthia Gao, Vedanuj Goswami, Francisco Guzmán, Philipp Koehn, Alexandre Mourachko, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Jeff Wang
Driven by the goal of eradicating language barriers on a global scale, machine translation has solidified itself as a key focus of artificial intelligence research today.
Ranked #1 on Machine Translation on IWSLT2017 French-English (SacreBLEU metric)
no code implementations • EACL 2021 • Xiang Kong, Adithya Renduchintala, James Cross, Yuqing Tang, Jiatao Gu, Xian Li
Recent work in multilingual translation advances translation quality surpassing bilingual baselines using deep transformer models with increased capacity.
3 code implementations • 22 May 2022 • Christos Baziotis, Mikel Artetxe, James Cross, Shruti Bhosale
We find that hyper-adapters are more parameter efficient than regular adapters, reaching the same performance with up to 12 times less parameters.
no code implementations • NAACL 2022 • Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe
Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages.
no code implementations • AMTA 2022 • Shiyue Zhang, Vishrav Chaudhary, Naman Goyal, James Cross, Guillaume Wenzek, Mohit Bansal, Francisco Guzman
Since a skewed data distribution is considered to be harmful, a sampling strategy is usually used to balance languages in the corpus.
no code implementations • 25 Mar 2022 • Tasnim Mohiuddin, Philipp Koehn, Vishrav Chaudhary, James Cross, Shruti Bhosale, Shafiq Joty
In this work, we introduce a two-stage curriculum training framework for NMT where we fine-tune a base NMT model on subsets of data, selected by both deterministic scoring using pre-trained methods and online scoring that considers prediction scores of the emerging NMT model.
no code implementations • NAACL 2022 • Dheeru Dua, Shruti Bhosale, Vedanuj Goswami, James Cross, Mike Lewis, Angela Fan
Multi-task learning with an unbalanced data distribution skews model learning towards high resource tasks, especially when model capacity is fixed and fully shared across all tasks.
no code implementations • ACL 2022 • Simeng Sun, Angela Fan, James Cross, Vishrav Chaudhary, Chau Tran, Philipp Koehn, Francisco Guzman
Further, we find that incorporating alternative inputs via self-ensemble can be particularly effective when training set is small, leading to +5 BLEU when only 5% of the total training data is accessible.
no code implementations • EMNLP 2021 • Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia
Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels.
no code implementations • 6 Aug 2021 • Chau Tran, Shruti Bhosale, James Cross, Philipp Koehn, Sergey Edunov, Angela Fan
We describe Facebook's multilingual model submission to the WMT2021 shared task on news translation.
1 code implementation • 22 Jun 2021 • Md Mahfuz ibn Alam, Antonios Anastasopoulos, Laurent Besacier, James Cross, Matthias Gallé, Philipp Koehn, Vassilina Nikoulina
As neural machine translation (NMT) systems become an important part of professional translator pipelines, a growing body of work focuses on combining NMT with terminologies.
no code implementations • EMNLP 2021 • Ahmed El-Kishky, Adithya Renduchintala, James Cross, Francisco Guzmán, Philipp Koehn
Cross-lingual named-entity lexica are an important resource to multilingual NLP tasks such as machine translation and cross-lingual wikification.
1 code implementation • ACL 2021 • Danni Liu, Jan Niehues, James Cross, Francisco Guzmán, Xian Li
The difficulty of generalizing to new translation directions suggests the model representations are highly specific to those language pairs seen in training.
no code implementations • 23 Aug 2020 • Qing Sun, James Cross
In this paper, we provide an in-depth analysis of KL-divergence minimization in Forward and Backward orders, which shows that learners are reinforced via on-policy learning in Backward.
2 code implementations • ICLR 2021 • Jungo Kasai, Nikolaos Pappas, Hao Peng, James Cross, Noah A. Smith
We show that the speed disadvantage for autoregressive baselines compared to non-autoregressive methods has been overestimated in three aspects: suboptimal layer allocation, insufficient speed measurement, and lack of knowledge distillation.
1 code implementation • 15 Jan 2020 • Jungo Kasai, James Cross, Marjan Ghazvininejad, Jiatao Gu
State-of-the-art neural machine translation models generate a translation from left to right and every step is conditioned on the previously generated tokens.
3 code implementations • ICLR 2020 • Xutai Ma, Juan Pino, James Cross, Liezl Puzon, Jiatao Gu
Simultaneous machine translation models start generating a target sequence before they have encoded or read the source sequence.
no code implementations • 25 Sep 2019 • Qing Sun, James Cross, Dmitriy Genzel
Sequence-to-sequence models such as transformers, which are now being used in a wide variety of NLP tasks, typically need to have very high capacity in order to perform well.
1 code implementation • WS 2018 • Felix Stahlberg, James Cross, Veselin Stoyanov
Neural Machine Translation (NMT) typically leverages monolingual data in training through backtranslation.
1 code implementation • EMNLP 2016 • James Cross, Liang Huang
Parsing accuracy using efficient greedy transition systems has improved dramatically in recent years thanks to neural networks.
no code implementations • ACL 2016 • James Cross, Liang Huang
Recently, neural network approaches for parsing have largely automated the combination of individual features, but still rely on (often a larger number of) atomic features created from human linguistic intuition, and potentially omitting important global context.
no code implementations • 19 Nov 2015 • James Cross, Bing Xiang, Bo-Wen Zhou
We propose two methods of learning vector representations of words and phrases that each combine sentence context with structural features extracted from dependency trees.