Search Results for author: George Foster

Found 37 papers, 6 papers with code

Bilingual Methods for Adaptive Training Data Selection for Machine Translation

no code implementations • AMTA 2016 • Boxing Chen, Roland Kuhn, George Foster, Colin Cherry, Fei Huang

In this paper, we propose a new data selection method which uses semi-supervised convolutional neural networks based on bitokens (Bi-SSCNNs) for training machine translation systems from a large bilingual corpus.

Machine Translation NMT +2

Paper
Add Code

A Natural Diet: Towards Improving Naturalness of Machine Translation Output

no code implementations • Findings (ACL) 2022 • Markus Freitag, David Vilar, David Grangier, Colin Cherry, George Foster

In this work we propose a method for training MT systems to achieve a more natural style, i. e. mirroring the style of text originally written in the target language.

Machine Translation Sentence +1

Paper
Add Code

Results of the WMT21 Metrics Shared Task: Evaluating Metrics with Expert-based Human Evaluations on TED and News Domain

no code implementations • WMT (EMNLP) 2021 • Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar

Contrary to previous years’ editions, this year we acquired our own human ratings based on expert-based human evaluation via Multidimensional Quality Metrics (MQM).

Translation

Paper
Add Code

Finding Replicable Human Evaluations via Stable Ranking Probability

no code implementations • 1 Apr 2024 • Parker Riley, Daniel Deutsch, George Foster, Viresh Ratnakar, Ali Dabirmoghaddam, Markus Freitag

Reliable human evaluation is critical to the development of successful natural language generation models, but achieving it is notoriously difficult.

Machine Translation Text Generation

Paper
Add Code

Importance-Aware Data Augmentation for Document-Level Neural Machine Translation

no code implementations • 27 Jan 2024 • Minghao Wu, YuFei Wang, George Foster, Lizhen Qu, Gholamreza Haffari

Document-level neural machine translation (DocNMT) aims to generate translations that are both coherent and cohesive, in contrast to its sentence-level counterpart.

Data Augmentation Machine Translation +2

Paper
Add Code

Adapting Large Language Models for Document-Level Machine Translation

no code implementations • 12 Jan 2024 • Minghao Wu, Thuy-Trang Vu, Lizhen Qu, George Foster, Gholamreza Haffari

Large language models (LLMs) have made significant strides in various natural language processing (NLP) tasks.

Document Level Machine Translation Domain Generalization +2

Paper
Add Code

To Diverge or Not to Diverge: A Morphosyntactic Perspective on Machine Translation vs Human Translation

no code implementations • 2 Jan 2024 • Jiaming Luo, Colin Cherry, George Foster

We conduct a large-scale fine-grained comparative analysis of machine translations (MT) against human translations (HT) through the lens of morphosyntactic divergence.

Attribute Machine Translation +1

Paper
Add Code

Ties Matter: Meta-Evaluating Modern Metrics with Pairwise Accuracy and Tie Calibration

1 code implementation • 23 May 2023 • Daniel Deutsch, George Foster, Markus Freitag

Kendall's tau is frequently used to meta-evaluate how well machine translation (MT) evaluation metrics score individual translations.

Machine Translation

Paper
Code

Searching for Needles in a Haystack: On the Role of Incidental Bilingualism in PaLM's Translation Capability

no code implementations • 17 May 2023 • Eleftheria Briakou, Colin Cherry, George Foster

We investigate the role of incidental bilingualism -- the unintentional consumption of bilingual signals, including translation examples -- in explaining the translation capabilities of large language models, taking the Pathways Language Model (PaLM) as a case study.

Language Modelling Machine Translation +1

Paper
Add Code

Document Flattening: Beyond Concatenating Context for Document-Level Neural Machine Translation

no code implementations • 16 Feb 2023 • Minghao Wu, George Foster, Lizhen Qu, Gholamreza Haffari

Existing work in document-level neural machine translation commonly concatenates several consecutive sentences as a pseudo-document, and then learns inter-sentential dependencies.

Machine Translation Translation

Paper
Add Code

The unreasonable effectiveness of few-shot learning for machine translation

no code implementations • 2 Feb 2023 • Xavier Garcia, Yamini Bansal, Colin Cherry, George Foster, Maxim Krikun, Fangxiaoyu Feng, Melvin Johnson, Orhan Firat

We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs.

Few-Shot Learning Machine Translation +2

Paper
Add Code

Prompting PaLM for Translation: Assessing Strategies and Performance

no code implementations • 16 Nov 2022 • David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster

Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages.

Language Modelling Machine Translation +1

Paper
Add Code

Toward More Effective Human Evaluation for Machine Translation

no code implementations • HumEval (ACL) 2022 • Belén Saldías, George Foster, Markus Freitag, Qijun Tan

Improvements in text generation technologies such as machine translation have necessitated more costly and time-consuming human evaluation procedures to ensure an accurate signal.

Machine Translation Text Generation +1

Paper
Add Code

Experts, Errors, and Context: A Large-Scale Study of Human Evaluation for Machine Translation

3 code implementations • 29 Apr 2021 • Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, Wolfgang Macherey

Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions.

Machine Translation Translation

393

Paper
Code

Assessing Reference-Free Peer Evaluation for Machine Translation

no code implementations • NAACL 2021 • Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry

Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains.

Machine Translation Translation

Paper
Add Code

Human-Paraphrased References Improve Neural Machine Translation

1 code implementation • WMT (EMNLP) 2020 • Markus Freitag, George Foster, David Grangier, Colin Cherry

When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment.

Machine Translation NMT +1

Paper
Code

Inference Strategies for Machine Translation with Conditional Masking

no code implementations • EMNLP 2020 • Julia Kreutzer, George Foster, Colin Cherry

Conditional masked language model (CMLM) training has proven successful for non-autoregressive and semi-autoregressive sequence generation tasks, such as machine translation.

Language Modelling Machine Translation +1

Paper
Add Code

Re-translation versus Streaming for Simultaneous Translation

no code implementations • WS 2020 • Naveen Arivazhagan, Colin Cherry, Wolfgang Macherey, George Foster

There has been great progress in improving streaming machine translation, a simultaneous paradigm where the system appends to a growing hypothesis as more source content becomes available.

Attribute Data Augmentation +2

Paper
Add Code

Re-Translation Strategies For Long Form, Simultaneous, Spoken Language Translation

1 code implementation • 6 Dec 2019 • Naveen Arivazhagan, Colin Cherry, Te I, Wolfgang Macherey, Pallavi Baljekar, George Foster

As this scenario allows for revisions to our incremental translations, we adopt a re-translation approach to simultaneous translation, where the source is repeatedly translated from scratch as it grows.

Machine Translation speech-recognition +2

2,772

Paper
Code

Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges

no code implementations • 11 Jul 2019 • Naveen Arivazhagan, Ankur Bapna, Orhan Firat, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Xu Chen, Yuan Cao, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen, Yonghui Wu

We introduce our efforts towards building a universal neural machine translation (NMT) system capable of translating between any language pair.

Machine Translation NMT +2

Paper
Add Code

Thinking Slow about Latency Evaluation for Simultaneous Machine Translation

no code implementations • 31 May 2019 • Colin Cherry, George Foster

Simultaneous machine translation attempts to translate a source sentence before it is finished being spoken, with applications to translation of spoken language for live streaming and conversation.

Machine Translation Sentence +1

Paper
Add Code

Reinforcement Learning based Curriculum Optimization for Neural Machine Translation

no code implementations • NAACL 2019 • Gaurav Kumar, George Foster, Colin Cherry, Maxim Krikun

We consider the problem of making efficient use of heterogeneous training data in neural machine translation (NMT).

Machine Translation NMT +4

Paper
Add Code

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations • 21 Feb 2019 • Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

2,781

Paper
Code

Shaping the Narrative Arc: An Information-Theoretic Approach to Collaborative Dialogue

no code implementations • 31 Jan 2019 • Kory W. Mathewson, Pablo Samuel Castro, Colin Cherry, George Foster, Marc G. Bellemare

We consider the problem of designing an artificial agent capable of interacting with humans in collaborative dialogue to produce creative, engaging narratives.

Specificity

Paper
Add Code

Revisiting Character-Based Neural Machine Translation with Capacity and Compression

no code implementations • EMNLP 2018 • Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey

Translating characters instead of words or word-fragments has the potential to simplify the processing pipeline for neural machine translation (NMT), and improve results by eliminating hyper-parameters and manual feature engineering.

Feature Engineering Machine Translation +2

Paper
Add Code

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

3 code implementations • ACL 2018 • Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng Chen, Yonghui Wu, Macduff Hughes

Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures.

Ranked #26 on Machine Translation on WMT2014 English-French

Machine Translation Translation

2,781

Paper
Code

NRC Machine Translation System for WMT 2017

no code implementations • WS 2017 • Chi-kiu Lo, Boxing Chen, Colin Cherry, George Foster, Samuel Larkin, Darlene Stewart, Rol Kuhn,

Domain Adaptation Machine Translation +1

Paper
Add Code

Cost Weighting for Neural Machine Translation Domain Adaptation

no code implementations • WS 2017 • Boxing Chen, Colin Cherry, George Foster, Samuel Larkin

We compare cost weighting to two traditional domain adaptation techniques developed for statistical machine translation: data selection and sub-corpus weighting.

Domain Adaptation Machine Translation +1