1 code implementation • AMTA 2016 • John Hewitt, Matt Post, David Yarowsky
Statistical Machine Translation (SMT) of highly inflected, low-resource languages suffers from the problem of low bitext availability, which is exacerbated by large inflectional paradigms.
no code implementations • WMT (EMNLP) 2020 • Rachel Bawden, Biao Zhang, Andre Tättar, Matt Post
We describe parBLEU, parCHRF++, and parESIM, which augment baseline metrics with automatically generated paraphrases produced by PRISM (Thompson and Post, 2020a), a multilingual neural machine translation system.
no code implementations • EMNLP (Eval4NLP) 2020 • Jacob Bremerman, Huda Khayrallah, Douglas Oard, Matt Post
The first and principal contribution is an evaluation measure that characterizes the translation quality of an entire n-best list by asking whether many of the valid translations are placed near the top of the list.
no code implementations • 16 Sep 2023 • Vikas Raunak, Tom Kocmi, Matt Post
This suggests that source context may provide the same information as a human reference.
1 code implementation • 14 Aug 2023 • Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt
Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.
1 code implementation • 26 May 2023 • Vikas Raunak, Arul Menezes, Matt Post, Hany Hassan Awadalla
On the task of Machine Translation (MT), multiple works have investigated few-shot prompting mechanisms to elicit better translations from LLMs.
no code implementations • 23 May 2023 • Elizabeth Salesky, Neha Verma, Philipp Koehn, Matt Post
We introduce and demonstrate how to effectively train multilingual machine translation models with pixel representations.
no code implementations • 25 Apr 2023 • Matt Post, Marcin Junczys-Dowmunt
It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation.
no code implementations • 19 Nov 2022 • Vikas Raunak, Matt Post, Arul Menezes
More concretely, while the associated utility and methods of interacting with generative models have expanded, a similar expansion has not been observed in their evaluation practices.
no code implementations • 23 Oct 2022 • Elijah Rippeth, Matt Post
Additive interventions are a recently-proposed mechanism for controlling target-side attributes in neural machine translation.
no code implementations • 20 May 2022 • Vikas Raunak, Matt Post, Arul Menezes
Traditional machine translation (MT) metrics provide an average measure of translation quality that is insensitive to the long tail of behavioral problems in MT.
1 code implementation • 11 Apr 2022 • Jian Xue, Peidong Wang, Jinyu Li, Matt Post, Yashesh Gaur
Neural transducers have been widely used in automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • WMT (EMNLP) 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Christian Federmann, Philipp Koehn
This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task.
1 code implementation • EMNLP 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn
We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation.
1 code implementation • ACL 2021 • Rachel Wicks, Matt Post
The sentence is a fundamental unit of text processing.
1 code implementation • EMNLP 2021 • Elizabeth Salesky, David Etter, Matt Post
Machine translation models have discrete vocabularies and commonly use subword segmentation techniques to achieve an 'open vocabulary.'
no code implementations • 2 Feb 2021 • Elizabeth Salesky, Matthew Wiesner, Jacob Bremerman, Roldano Cattoni, Matteo Negri, Marco Turchi, Douglas W. Oard, Matt Post
We present the Multilingual TEDx corpus, built to support speech recognition (ASR) and speech translation (ST) research across many non-English source languages.
no code implementations • EMNLP 2020 • Loïc Barrault, Magdalena Biesialska, Ondřej Bojar, Marta R. Costa-jussà, Christian Federmann, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Matthias Huck, Eric Joanis, Tom Kocmi, Philipp Koehn, Chi-kiu Lo, Nikola Ljubešić, Christof Monz, Makoto Morishita, Masaaki Nagata, Toshiaki Nakazawa, Santanu Pal, Matt Post, Marcos Zampieri
In the news task, participants were asked to build machine translation systems for any of 11 language pairs, to be evaluated on test sets consisting mainly of news stories.
1 code implementation • WMT (EMNLP) 2020 • Brian Thompson, Matt Post
Recent work has shown that a multilingual neural machine translation (NMT) model can be used to judge how well a sentence paraphrases another sentence in the same language (Thompson and Post, 2020); however, attempting to generate paraphrases from such a model using standard beam search produces trivial copies or near copies.
no code implementations • WS 2020 • Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post
This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE).
no code implementations • LREC 2020 • Kevin Duh, Paul McNamee, Matt Post, Brian Thompson
In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili.
no code implementations • LREC 2020 • Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky
The corpus consists of over 4000 unique translations of the Christian Bible and counting.
1 code implementation • EMNLP 2020 • Brian Thompson, Matt Post
We frame the task of machine translation evaluation as one of scoring machine translation output with a sequence-to-sequence paraphraser, conditioned on a human reference.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Rachel Bawden, Biao Zhang, Lisa Yankovskaya, Andre Tättar, Matt Post
We investigate a long-perceived shortcoming in the typical use of BLEU: its reliance on a single reference.
1 code implementation • EMNLP 2020 • Huda Khayrallah, Brian Thompson, Matt Post, Philipp Koehn
Many valid translations exist for a given sentence, yet machine translation (MT) is trained with a single reference translation, exacerbating data sparsity in low-resource settings.
no code implementations • CONLL 2019 • J. Edward Hu, Abhinav Singh, Nils Holzenberger, Matt Post, Benjamin Van Durme
Producing diverse paraphrases of a sentence is a challenging task.
no code implementations • IJCNLP 2019 • Elias Stengel-Eskin, Tzu-Ray Su, Matt Post, Benjamin Van Durme
We introduce a novel discriminative word alignment model, which we integrate into a Transformer-based machine translation model.
no code implementations • WS 2019 • Matt Post, Kevin Duh
We describe the JHU submissions to the French{--}English, Japanese{--}English, and English{--}Japanese Robustness Task at WMT 2019.
no code implementations • WS 2019 • Lo{\"\i}c Barrault, Ond{\v{r}}ej Bojar, Marta R. Costa-juss{\`a}, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Philipp Koehn, Shervin Malmasi, Christof Monz, Mathias M{\"u}ller, Santanu Pal, Matt Post, Marcos Zampieri
This paper presents the results of the premier shared task organized alongside the Conference on Machine Translation (WMT) 2019.
1 code implementation • NAACL 2019 • J. Edward Hu, Huda Khayrallah, Ryan Culkin, Patrick Xia, Tongfei Chen, Matt Post, Benjamin Van Durme
Lexically-constrained sequence decoding allows for explicit positive or negative phrase-based constraints to be placed on target output strings in generation tasks such as machine translation or monolingual text rewriting.
1 code implementation • TACL 2020 • Sorami Hisamoto, Matt Post, Kevin Duh
Data privacy is an important issue for "machine learning as a service" providers.
no code implementations • 11 Jan 2019 • J. Edward Hu, Rachel Rudinger, Matt Post, Benjamin Van Durme
We present ParaBank, a large-scale English paraphrase dataset that surpasses prior work in both quantity and quality.
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
2 code implementations • WS 2018 • Matt Post
The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric.
no code implementations • NAACL 2018 • Matt Post, David Vilar
The end-to-end nature of neural machine translation (NMT) removes many ways of manually guiding the translation process that were available in older paradigms.
16 code implementations • 15 Dec 2017 • Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post
Written in Python and built on MXNet, the toolkit offers scalable training and inference for the three most prominent encoder-decoder architectures: attentional recurrent neural networks, self-attentional transformers, and fully convolutional networks.
no code implementations • IJCNLP 2017 • Huda Khayrallah, Gaurav Kumar, Kevin Duh, Matt Post, Philipp Koehn
Domain adaptation is a major challenge for neural machine translation (NMT).
no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi
no code implementations • IJCNLP 2017 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
We propose a neural encoder-decoder model with reinforcement learning (NRL) for grammatical error correction (GEC).
1 code implementation • ACL 2017 • Keisuke Sakaguchi, Matt Post, Benjamin Van Durme
We propose a new dependency parsing scheme which jointly parses a sentence and repairs grammatical errors by extending the non-directional transition-based formalism of Goldberg and Elhadad (2010) with three additional actions: SUBSTITUTE, DELETE, INSERT.
no code implementations • 1 Jun 2017 • Jan Trmal, Gaurav Kumar, Vimal Manohar, Sanjeev Khudanpur, Matt Post, Paul McNamee
The paper summarizes the development of the LVCSR system built as a part of the Pashto speech-translation system at the SCALE (Summer Camp for Applied Language Exploration) 2015 workshop on "Speech-to-text-translation for low-resource languages".
no code implementations • EACL 2017 • Christo Kirov, John Sylak-Glassman, Rebecca Knowles, Ryan Cotterell, Matt Post
A traditional claim in linguistics is that all human languages are equally expressive{---}able to convey the same wide range of meanings.
1 code implementation • 7 Aug 2016 • Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme
Inspired by the findings from the Cmabrigde Uinervtisy effect, we propose a word recognition model based on a semi-character level recurrent neural network (scRNN).
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
1 code implementation • 9 May 2016 • Courtney Napoles, Keisuke Sakaguchi, Matt Post, Joel Tetreault
The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015).
1 code implementation • TACL 2016 • Keisuke Sakaguchi, Courtney Napoles, Matt Post, Joel Tetreault
The field of grammatical error correction (GEC) has grown substantially in recent years, with research directed at both evaluation metrics and improved system performance against those metrics.
no code implementations • WS 2015 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Barry Haddow, Matthias Huck, Chris Hokamp, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Carolina Scarton, Lucia Specia, Marco Turchi
no code implementations • WS 2014 • Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, Aleš Tamchyna
no code implementations • LREC 2014 • Jennifer Drexler, Pushpendre Rastogi, Jacqueline Aguilar, Benjamin Van Durme, Matt Post
We describe a corpus for target-contextualized machine translation (MT), where the task is to improve the translation of source documents using language models built over presumably related documents in the target language.
no code implementations • TACL 2014 • Ellie Pavlick, Matt Post, Ann Irvine, Dmitry Kachaev, Chris Callison-Burch
We present a large scale study of the languages spoken by bilingual workers on Mechanical Turk (MTurk).
no code implementations • TACL 2013 • Adam Lopez, Matt Post, Chris Callison-Burch, Jonathan Weese, Juri Ganitkevitch, Narges Ahmidi, Olivia Buzek, Leah Hanson, Beenish Jamil, Matthias Lee, Ya-Ting Lin, Henry Pao, Fatima Rivera, Leili Shahriyari, Debu Sinha, Adam Teichert, Stephen Wampler, Michael Weinberger, Daguang Xu, Lin Yang, Shang Zhao
Machine translation (MT) draws from several different disciplines, making it a complex subject to teach.