no code implementations • WMT (EMNLP) 2021 • Markus Freitag, Ricardo Rei, Nitika Mathur, Chi-kiu Lo, Craig Stewart, George Foster, Alon Lavie, Ondřej Bojar
Contrary to previous years’ editions, this year we acquired our own human ratings based on expert-based human evaluation via Multidimensional Quality Metrics (MQM).
no code implementations • Findings (ACL) 2022 • Markus Freitag, David Vilar, David Grangier, Colin Cherry, George Foster
In this work we propose a method for training MT systems to achieve a more natural style, i. e. mirroring the style of text originally written in the target language.
no code implementations • WMT (EMNLP) 2020 • Rajen Chatterjee, Markus Freitag, Matteo Negri, Marco Turchi
Due to i) the different source/domain of data compared to the past (Wikipedia vs Information Technology), ii) the different quality of the initial translations to be corrected and iii) the introduction of a new language pair (English-Chinese), this year’s results are not directly comparable with last year’s round.
no code implementations • WMT (EMNLP) 2020 • Nitika Mathur, Johnny Wei, Markus Freitag, Qingsong Ma, Ondřej Bojar
Participants were asked to score the outputs of the translation systems competing in the WMT20 News Translation Task with automatic metrics.
no code implementations • WMT (EMNLP) 2021 • Farhad Akhbardeh, Arkady Arkhangorodsky, Magdalena Biesialska, Ondřej Bojar, Rajen Chatterjee, Vishrav Chaudhary, Marta R. Costa-Jussa, Cristina España-Bonet, Angela Fan, Christian Federmann, Markus Freitag, Yvette Graham, Roman Grundkiewicz, Barry Haddow, Leonie Harter, Kenneth Heafield, Christopher Homan, Matthias Huck, Kwabena Amponsah-Kaakyire, Jungo Kasai, Daniel Khashabi, Kevin Knight, Tom Kocmi, Philipp Koehn, Nicholas Lourie, Christof Monz, Makoto Morishita, Masaaki Nagata, Ajay Nagesh, Toshiaki Nakazawa, Matteo Negri, Santanu Pal, Allahsera Auguste Tapo, Marco Turchi, Valentin Vydrin, Marcos Zampieri
This paper presents the results of the newstranslation task, the multilingual low-resourcetranslation for Indo-European languages, thetriangular translation task, and the automaticpost-editing task organised as part of the Con-ference on Machine Translation (WMT) 2021. In the news task, participants were asked tobuild machine translation systems for any of10 language pairs, to be evaluated on test setsconsisting mainly of news stories.
no code implementations • 23 May 2023 • Daniel Deutsch, George Foster, Markus Freitag
Kendall's tau is frequently used to meta-evaluate how well machine translation (MT) evaluation metrics score individual translations.
1 code implementation • 23 May 2023 • Wenda Xu, Danqing Wang, Liangming Pan, Zhenqiao Song, Markus Freitag, William Yang Wang, Lei LI
In particular, since the advent of neural metrics, like COMET, BLEURT, and SEScore2, the newest generation of metrics show a high correlation with human judgment.
no code implementations • 17 May 2023 • Markus Freitag, Behrooz Ghorbani, Patrick Fernandes
Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR) decoding can be a powerful alternative to beam search decoding, especially when combined with neural-based utility functions.
no code implementations • 17 May 2023 • Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, Siamak Shakeri, Emanuel Taropa, Paige Bailey, Zhifeng Chen, Eric Chu, Jonathan H. Clark, Laurent El Shafey, Yanping Huang, Kathy Meier-Hellstern, Gaurav Mishra, Erica Moreira, Mark Omernick, Kevin Robinson, Sebastian Ruder, Yi Tay, Kefan Xiao, Yuanzhong Xu, Yujing Zhang, Gustavo Hernandez Abrego, Junwhan Ahn, Jacob Austin, Paul Barham, Jan Botha, James Bradbury, Siddhartha Brahma, Kevin Brooks, Michele Catasta, Yong Cheng, Colin Cherry, Christopher A. Choquette-Choo, Aakanksha Chowdhery, Clément Crepy, Shachi Dave, Mostafa Dehghani, Sunipa Dev, Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vlad Feinberg, Fangxiaoyu Feng, Vlad Fienber, Markus Freitag, Xavier Garcia, Sebastian Gehrmann, Lucas Gonzalez, Guy Gur-Ari, Steven Hand, Hadi Hashemi, Le Hou, Joshua Howland, Andrea Hu, Jeffrey Hui, Jeremy Hurwitz, Michael Isard, Abe Ittycheriah, Matthew Jagielski, Wenhao Jia, Kathleen Kenealy, Maxim Krikun, Sneha Kudugunta, Chang Lan, Katherine Lee, Benjamin Lee, Eric Li, Music Li, Wei Li, Yaguang Li, Jian Li, Hyeontaek Lim, Hanzhao Lin, Zhongtao Liu, Frederick Liu, Marcello Maggioni, Aroma Mahendru, Joshua Maynez, Vedant Misra, Maysam Moussalem, Zachary Nado, John Nham, Eric Ni, Andrew Nystrom, Alicia Parrish, Marie Pellat, Martin Polacek, Alex Polozov, Reiner Pope, Siyuan Qiao, Emily Reif, Bryan Richter, Parker Riley, Alex Castro Ros, Aurko Roy, Brennan Saeta, Rajkumar Samuel, Renee Shelby, Ambrose Slone, Daniel Smilkov, David R. So, Daniel Sohn, Simon Tokumine, Dasha Valter, Vijay Vasudevan, Kiran Vodrahalli, Xuezhi Wang, Pidong Wang, ZiRui Wang, Tao Wang, John Wieting, Yuhuai Wu, Kelvin Xu, Yunhan Xu, Linting Xue, Pengcheng Yin, Jiahui Yu, Qiao Zhang, Steven Zheng, Ce Zheng, Weikang Zhou, Denny Zhou, Slav Petrov, Yonghui Wu
Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM.
no code implementations • 19 Feb 2023 • Patrick Fernandes, Behrooz Ghorbani, Xavier Garcia, Markus Freitag, Orhan Firat
Through a novel joint scaling law formulation, we compute the effective number of parameters allocated to each language pair and examine the role of language similarity in the scaling behavior of our models.
no code implementations • 16 Nov 2022 • David Vilar, Markus Freitag, Colin Cherry, Jiaming Luo, Viresh Ratnakar, George Foster
Large language models (LLMs) that have been trained on multilingual but not parallel text exhibit a remarkable ability to translate between languages.
1 code implementation • 6 Oct 2022 • Freda Shi, Mirac Suzgun, Markus Freitag, Xuezhi Wang, Suraj Srivats, Soroush Vosoughi, Hyung Won Chung, Yi Tay, Sebastian Ruder, Denny Zhou, Dipanjan Das, Jason Wei
Finally, we show that the multilingual reasoning abilities of language models extend to other tasks such as commonsense reasoning and word-in-context semantic judgment.
1 code implementation • NAACL 2022 • Jingwei Ni, Zhijing Jin, Markus Freitag, Mrinmaya Sachan, Bernhard Schölkopf
We show that these two factors have a large causal effect on the MT performance, in addition to the test-model direction mismatch highlighted by existing work on the impact of translationese.
no code implementations • HumEval (ACL) 2022 • Belén Saldías, George Foster, Markus Freitag, Qijun Tan
Improvements in text generation technologies such as machine translation have necessitated more costly and time-consuming human evaluation procedures to ensure an accurate signal.
no code implementations • 17 Nov 2021 • Markus Freitag, David Grangier, Qijun Tan, Bowen Liang
In Neural Machine Translation, it is typically assumed that the sentence with the highest estimated probability should also be the translation with the highest quality as measured by humans.
no code implementations • ICLR 2022 • Behrooz Ghorbani, Orhan Firat, Markus Freitag, Ankur Bapna, Maxim Krikun, Xavier Garcia, Ciprian Chelba, Colin Cherry
We show that cross-entropy loss as a function of model size follows a certain scaling law.
no code implementations • 9 Jul 2021 • Scott Roy, Cliff Brunk, Kyu-Young Kim, Justin Zhao, Markus Freitag, Mihir Kale, Gagan Bansal, Sidharth Mudgal, Chris Varano
One of the challenges in a task oriented natural language application like the Google Assistant, Siri, or Alexa is to localize the output to many languages.
no code implementations • NAACL 2022 • Kelly Marchisio, Markus Freitag, David Grangier
Modern unsupervised machine translation (MT) systems reach reasonable translation quality under clean and controlled data conditions.
3 code implementations • 29 Apr 2021 • Markus Freitag, George Foster, David Grangier, Viresh Ratnakar, Qijun Tan, Wolfgang Macherey
Human evaluation of modern high-quality machine translation systems is a difficult problem, and there is increasing evidence that inadequate evaluation procedures can lead to erroneous conclusions.
no code implementations • NAACL 2021 • Sweta Agrawal, George Foster, Markus Freitag, Colin Cherry
Reference-free evaluation has the potential to make machine translation evaluation substantially more scalable, allowing us to pivot easily to new languages or domains.
no code implementations • WMT (EMNLP) 2020 • Markus Freitag, Orhan Firat
We reintroduce this direct parallel data from multi-way aligned corpora between all source and target languages.
1 code implementation • WMT (EMNLP) 2020 • Markus Freitag, George Foster, David Grangier, Colin Cherry
When used in place of original references, the paraphrased versions produce metric scores that correlate better with human judgment.
no code implementations • WMT (EMNLP) 2020 • Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun Tan, Markus Freitag, Dipanjan Das, Ankur P. Parikh
The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zorik Gekhman, Roee Aharoni, Genady Beryozkin, Markus Freitag, Wolfgang Macherey
Our approach achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references, which is the largest number of wins for a single evaluation method on this task.
no code implementations • ACL 2020 • Parker Riley, Isaac Caswell, Markus Freitag, David Grangier
Machine translation has an undesirable propensity to produce {``}translationese{''} artifacts, which can lead to higher BLEU scores while being liked less by human raters.
2 code implementations • EMNLP 2020 • Markus Freitag, David Grangier, Isaac Caswell
The quality of automatic metrics for machine translation has been increasingly called into question, especially for high-quality systems.
no code implementations • 10 Nov 2019 • Parker Riley, Isaac Caswell, Markus Freitag, David Grangier
Machine translation has an undesirable propensity to produce "translationese" artifacts, which can lead to higher BLEU scores while being liked less by human raters.
no code implementations • WS 2019 • Markus Freitag, Isaac Caswell, Scott Roy
In this work, we train an Automatic Post-Editing (APE) model and use it to reveal biases in standard Machine Translation (MT) evaluation procedures.
1 code implementation • EMNLP 2018 • Markus Freitag, Scott Roy
Generating text from structured data is important for various tasks such as question answering and dialog systems.
no code implementations • 12 Jun 2017 • Baskaran Sankaran, Markus Freitag, Yaser Al-Onaizan
Usually, the candidate lists are a combination of external word-to-word aligner, phrase table entries or most frequent words.
no code implementations • WS 2015 • Markus Freitag, Jan-Thorsten Peter, Stephan Peitz, Minwei Feng, Hermann Ney
In this paper, we enhance the traditional confusion network system combination approach with an additional model trained by a neural network.
no code implementations • 6 Feb 2017 • Markus Freitag, Yaser Al-Onaizan, Baskaran Sankaran
Knowledge distillation describes a method for training a student network to perform better by learning from a stronger teacher network.
1 code implementation • WS 2017 • Markus Freitag, Yaser Al-Onaizan
In this paper, we concentrate on speeding up the decoder by applying a more flexible beam search strategy whose candidate size may vary at each time step depending on the candidate scores.
no code implementations • 20 Dec 2016 • Markus Freitag, Yaser Al-Onaizan
The basic concept in NMT is to train a large Neural Network that maximizes the translation performance on a given parallel corpus.