Search Results for author: Matt Post

Found 71 papers, 23 papers with code

On the Evaluation of Machine Translation n-best Lists

no code implementations • EMNLP (Eval4NLP) 2020 • Jacob Bremerman, Huda Khayrallah, Douglas Oard, Matt Post

The first and principal contribution is an evaluation measure that characterizes the translation quality of an entire n-best list by asking whether many of the valid translations are placed near the top of the list.

Machine Translation Translation +1

Paper
Add Code

ParBLEU: Augmenting Metrics with Automatic Paraphrases for the WMT’20 Metrics Shared Task

no code implementations • WMT (EMNLP) 2020 • Rachel Bawden, Biao Zhang, Andre Tättar, Matt Post

We describe parBLEU, parCHRF++, and parESIM, which augment baseline metrics with automatically generated paraphrases produced by PRISM (Thompson and Post, 2020a), a multilingual neural machine translation system.

Machine Translation Translation

Paper
Add Code

Automatic Construction of Morphologically Motivated Translation Models for Highly Inflected, Low-Resource Languages

1 code implementation • AMTA 2016 • John Hewitt, Matt Post, David Yarowsky

Statistical Machine Translation (SMT) of highly inflected, low-resource languages suffers from the problem of low bitext availability, which is exacerbated by large inflectional paradigms.

Machine Translation Translation

Paper
Code

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies

1 code implementation • 12 Jan 2024 • Tom Kocmi, Vilém Zouhar, Christian Federmann, Matt Post

Ten years ago a single metric, BLEU, governed progress in machine translation research.

Machine Translation Translation

399

Paper
Code

Improving Word Sense Disambiguation in Neural Machine Translation with Salient Document Context

no code implementations • 27 Nov 2023 • Elijah Rippeth, Marine Carpuat, Kevin Duh, Matt Post

Lexical ambiguity is a challenging and pervasive problem in machine translation (\mt).

Machine Translation Sentence +2

Paper
Add Code

Identifying Context-Dependent Translations for Evaluation Set Production

1 code implementation • 4 Nov 2023 • Rachel Wicks, Matt Post

A major impediment to the transition to context-aware machine translation is the absence of good evaluation metrics and test sets.

Machine Translation Sentence

Paper
Code

SLIDE: Reference-free Evaluation for Machine Translation using a Sliding Document Window

no code implementations • 16 Sep 2023 • Vikas Raunak, Tom Kocmi, Matt Post

This suggests that source context may provide the same information as a human reference in disambiguating source ambiguities.

Machine Translation Sentence

Paper
Add Code

SOTASTREAM: A Streaming Approach to Machine Translation Training

1 code implementation • 14 Aug 2023 • Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt

Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.

Machine Translation Management +2

Paper
Code

Do GPTs Produce Less Literal Translations?

1 code implementation • 26 May 2023 • Vikas Raunak, Arul Menezes, Matt Post, Hany Hassan Awadalla

On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs.

Machine Translation NMT +3

Paper
Code

Multilingual Pixel Representations for Translation and Effective Cross-lingual Transfer

no code implementations • 23 May 2023 • Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post

We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations.

Cross-Lingual Transfer Machine Translation +1

Paper
Add Code

Escaping the sentence-level paradigm in machine translation

no code implementations • 25 Apr 2023 • Matt Post, Marcin Junczys-Dowmunt

It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation.

Machine Translation Sentence +1

Paper
Add Code

Operationalizing Specifications, In Addition to Test Sets for Evaluating Constrained Generative Models

no code implementations • 19 Nov 2022 • Vikas Raunak, Matt Post, Arul Menezes

More concretely, while the associated utility and methods of interacting with generative models have expanded, a similar expansion has not been observed in their evaluation practices.

Domain Generalization In-Context Learning

Paper
Add Code

Additive Interventions Yield Robust Multi-Domain Machine Translation Models

no code implementations • 23 Oct 2022 • Elijah Rippeth, Matt Post

Additive interventions are a recently-proposed mechanism for controlling target-side attributes in neural machine translation.

Machine Translation TAG +1

Paper
Add Code

SALTED: A Framework for SAlient Long-Tail Translation Error Detection

no code implementations • 20 May 2022 • Vikas Raunak, Matt Post, Arul Menezes

Traditional machine translation (MT) metrics provide an average measure of translation quality that is insensitive to the long tail of behavioral problems in MT.

Machine Translation NMT +2

Paper
Add Code

Large-Scale Streaming End-to-End Speech Translation with Neural Transducers

1 code implementation • 11 Apr 2022 • Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur

Neural transducers have been widely used in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task

no code implementations • WMT (EMNLP) 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Christian Federmann, Philipp Koehn

This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task.

Data Augmentation Task 2 +1

Paper
Add Code

Levenshtein Training for Word-level Quality Estimation

1 code implementation • EMNLP 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn

We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation.

Transfer Learning Translation

Paper
Code

A unified approach to sentence segmentation of punctuated text in many languages

1 code implementation • ACL 2021 • Rachel Wicks, Matt Post

The sentence is a fundamental unit of text processing.

Sentence Sentence segmentation

Paper
Code

Robust Open-Vocabulary Translation from Visual Text Representations

1 code implementation • EMNLP 2021 • Elizabeth Salesky, David Etter, Matt Post

Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an 'open vocabulary.'

Machine Translation Translation

Paper
Code

The Multilingual TEDx Corpus for Speech Recognition and Translation

no code implementations • 2 Feb 2021 • Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post

We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.

speech-recognition Speech Recognition +1

Paper
Add Code

Findings of the 2020 Conference on Machine Translation (WMT20)

no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri

In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.

Machine Translation Translation

Paper
Add Code

Paraphrase Generation as Zero-Shot Multilingual Translation: Disentangling Semantic Similarity from Lexical and Syntactic Diversity

1 code implementation • WMT (EMNLP) 2020 • Brian Thompson, Matt Post

Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies.

Machine Translation NMT +5

Paper
Code

The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education

no code implementations • WS 2020 • Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post

This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE).

Machine Translation Translation

Paper
Add Code

The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration

no code implementations • LREC 2020 • Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky

The corpus consists of over 4000 unique translations of the Christian Bible and counting.

Paper
Add Code

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

no code implementations • LREC 2020 • Kevin Duh, Paul McNamee, Matt Post, Brian Thompson

In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili.

Benchmarking Machine Translation +2

Paper
Add Code

A Study in Improving BLEU Reference Coverage with Diverse Automatic Paraphrasing

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Rachel Bawden, Biao Zhang, Lisa Yankovskaya, Andre Tättar, Matt Post

We investigate a long-perceived shortcoming in the typical use of BLEU: its reliance on a single reference.

Machine Translation Sentence +1

Paper
Code

Simulated Multiple Reference Training Improves Low-Resource Machine Translation

1 code implementation • EMNLP 2020 • Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn

Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.

Machine Translation Sentence +2

Paper
Code

Automatic Machine Translation Evaluation in Many Languages via Zero-Shot Paraphrasing

1 code implementation • EMNLP 2020 • Brian Thompson, Matt Post

We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference.

Machine Translation NMT +1

Paper
Code

Large-Scale, Diverse, Paraphrastic Bitexts via Sampling and Clustering

no code implementations • CONLL 2019 • J. Edward Hu, Abhinav Singh, Nils Holzenberger, Matt Post, Benjamin Van Durme

Producing diverse paraphrases of a sentence is a challenging task.

Clustering Sentence +1

Paper
Add Code

A Discriminative Neural Model for Cross-Lingual Word Alignment

no code implementations • IJCNLP 2019 • Elias Stengel-Eskin, Tzu-Ray Su, Matt Post, Benjamin Van Durme

We introduce a novel discriminative word alignment model, which we integrate into a Transformer-based machine translation model.

Machine Translation NER +2

Paper
Add Code

An Exploration of Placeholding in Neural Machine Translation

no code implementations • WS 2019 • Matt Post, Shuoyang Ding, Marianna Martindale, Winston Wu

Machine Translation Translation

Paper
Add Code

Findings of the 2019 Conference on Machine Translation (WMT19)

no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri

This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.

Machine Translation Translation

Paper
Add Code

JHU 2019 Robustness Task System Description

no code implementations • WS 2019 • Matt Post, Kevin Duh

We describe the JHU submissions to the French{--}English, Japanese{--}English, and English{--}Japanese Robustness Task at WMT 2019.

Paper
Add Code

Improved Lexically Constrained Decoding for Translation and Monolingual Rewriting

1 code implementation • NAACL 2019 • J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, Benjamin Van Durme

Lexically-constrained sequence decoding allows for explicit positive or negative phrase-based constraints to be placed on target output strings in generation tasks such as machine translation or monolingual text rewriting.

Data Augmentation Machine Translation +3

1,206

Paper
Code

Membership Inference Attacks on Sequence-to-Sequence Models: Is My Data In Your Machine Translation System?

1 code implementation • TACL 2020 • Sorami Hisamoto, Matt Post, Kevin Duh

Data privacy is an important issue for "machine learning as a service" providers.

Machine Translation Translation +1

Paper
Code

ParaBank: Monolingual Bitext Generation and Sentential Paraphrasing via Lexically-constrained Neural Machine Translation

no code implementations • 11 Jan 2019 • J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme

We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality.

Machine Translation NMT +5

Paper
Add Code

Proceedings of the Third Conference on Machine Translation: Shared Task Papers

no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor

Machine Translation Translation

Paper
Add Code

A Call for Clarity in Reporting BLEU Scores

2 code implementations • WS 2018 • Matt Post

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric.

Machine Translation Translation

972

Paper
Code

Fast Lexically Constrained Decoding with Dynamic Beam Allocation for Neural Machine Translation

no code implementations • NAACL 2018 • Matt Post, David Vilar

The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms.

Machine Translation NMT +1

Paper
Add Code

The Sockeye Neural Machine Translation Toolkit at AMTA 2018

1 code implementation • WS 2018 • Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

Machine Translation Translation

1,206

Paper
Code

Sockeye: A Toolkit for Neural Machine Translation

16 code implementations • 15 Dec 2017 • Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post

Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks.

Machine Translation NMT +1

1,206

Paper
Code

Neural Lattice Search for Domain Adaptation in Machine Translation

no code implementations • IJCNLP 2017 • Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, Philipp Koehn

Domain adaptation is a major challenge for neural machine translation (NMT).

Domain Adaptation Machine Translation +2

Paper
Add Code

Findings of the 2017 Conference on Machine Translation (WMT17)

no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi

Automatic Post-Editing Multimodal Machine Translation +1

Paper
Add Code

The JHU Machine Translation Systems for WMT 2017

no code implementations • WS 2017 • Shuoyang Ding, Huda Khayrallah, Philipp Koehn, Matt Post, Gaurav Kumar, Kevin Duh

Language Modelling Machine Translation +1

Paper
Add Code

Grammatical Error Correction with Neural Reinforcement Learning

no code implementations • IJCNLP 2017 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme

We propose a neural encoder-decoder model with reinforcement learning (NRL) for grammatical error correction (GEC).

Grammatical Error Correction reinforcement-learning +2

Paper
Add Code

Error-repair Dependency Parsing for Ungrammatical Texts

1 code implementation • ACL 2017 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme

We propose a new dependency parsing scheme which jointly parses a sentence and repairs grammatical errors by extending the non-directional transition-based formalism of Goldberg and Elhadad (2010) with three additional actions: SUBSTITUTE, DELETE, INSERT.

Dependency Parsing Sentence

Paper
Code

Using of heterogeneous corpora for training of an ASR system

no code implementations • 1 Jun 2017 • Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee

The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on "Speech-to-text-translation for low-resource languages".

speech-recognition Speech Recognition +2

Paper
Add Code

A Rich Morphological Tagger for English: Exploring the Cross-Linguistic Tradeoff Between Morphology and Syntax

no code implementations • EACL 2017 • Christo Kirov, John Sylak-Glassman, Rebecca Knowles, Ryan Cotterell, Matt Post

A traditional claim in linguistics is that all human languages are equally expressive{---}able to convey the same wide range of meanings.

Dependency Parsing Machine Translation +3

Paper
Add Code

Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network

1 code implementation • 7 Aug 2016 • Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme

Inspired by the findings from the Cmabrigde Uinervtisy effect, we propose a word recognition model based on a semi-character level recurrent neural network (scRNN).

Spelling Correction

Paper
Code

Proceedings of the First Conference on Machine Translation: Volume 2, Shared Task Papers

no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi

Machine Translation Translation

Paper
Add Code

Findings of the 2016 Conference on Machine Translation

no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri

Automatic Post-Editing Multimodal Machine Translation +1

Paper
Add Code

The JHU Machine Translation Systems for WMT 2016

no code implementations • WS 2016 • Shuoyang Ding, Kevin Duh, Huda Khayrallah, Philipp Koehn, Matt Post

Language Modelling Machine Translation +1

Paper
Add Code

Sentential Paraphrasing as Black-Box Machine Translation

no code implementations • NAACL 2016 • Courtney Napoles, Chris Callison-Burch, Matt Post

Grammatical Error Correction Language Modelling +2

Paper
Add Code

GLEU Without Tuning

1 code implementation • 9 May 2016 • Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault

The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015).

Paper
Code

Reassessing the Goals of Grammatical Error Correction: Fluency Instead of Grammaticality

1 code implementation • TACL 2016 • Keisuke Sakaguchi, Courtney Napoles, Matt Post, Joel Tetreault

The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics.

Grammatical Error Correction Sentence

Paper
Code

Findings of the 2015 Workshop on Statistical Machine Translation

no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi

Automatic Post-Editing Translation

Paper
Add Code

Ground Truth for Grammatical Error Correction Metrics

no code implementations • IJCNLP 2015 • Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault

Grammatical Error Correction Machine Translation +1

Paper
Add Code

Findings of the 2014 Workshop on Statistical Machine Translation

no code implementations • WS 2014 • Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, Aleš Tamchyna

Cross-Lingual Information Retrieval Domain Adaptation +5

Paper
Add Code

Efficient Elicitation of Annotations for Human Evaluation of Machine Translation

1 code implementation • WS 2014 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme

Machine Translation Translation

Paper
Code

A Wikipedia-based Corpus for Contextualized Machine Translation

no code implementations • LREC 2014 • Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme, Matt Post

We describe a corpus for target-contextualized machine translation (MT), where the task is to improve the translation of source documents using language models built over presumably related documents in the target language.

Domain Adaptation Language Modelling +2