no code implementations • IWSLT 2016 • Marcin Junczys-Dowmunt, Alexandra Birch
This paper describes the submission of the University of Edinburgh team to the IWSLT MT task for TED talks.
no code implementations • 7 Oct 2024 • Vikas Raunak, Roman Grundkiewicz, Marcin Junczys-Dowmunt
In this work, we introduce instruction finetuning for Neural Machine Translation (NMT) models, which distills instruction following capabilities from Large Language Models (LLMs) into orders-of-magnitude smaller NMT models.
no code implementations • 15 Aug 2024 • Thamme Gowda, Roman Grundkiewicz, Elijah Rippeth, Matt Post, Marcin Junczys-Dowmunt
We describe a Python interface to Marian NMT, a C++-based training and inference toolkit for sequence-to-sequence models, focusing on machine translation.
no code implementations • 14 Nov 2023 • Hieu Hoang, Huda Khayrallah, Marcin Junczys-Dowmunt
We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input.
1 code implementation • 14 Aug 2023 • Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt
Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.
1 code implementation • 25 Apr 2023 • Matt Post, Marcin Junczys-Dowmunt
It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation.
no code implementations • WMT (EMNLP) 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Christian Federmann, Philipp Koehn
This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task.
1 code implementation • EMNLP 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn
We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation.
2 code implementations • WMT (EMNLP) 2021 • Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes
Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another.
no code implementations • EACL (HumEval) 2021 • Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi
Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments.
1 code implementation • NAACL 2021 • Vikas Raunak, Arul Menezes, Marcin Junczys-Dowmunt
In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies.
no code implementations • WS 2019 • Roman Grundkiewicz, Marcin Junczys-Dowmunt
There has been an increased interest in low-resource approaches to automatic grammatical error correction.
no code implementations • WS 2019 • Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev
Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models.
1 code implementation • WS 2019 • Roman Grundkiewicz, Marcin Junczys-Dowmunt, Kenneth Heafield
Considerable effort has been made to address the data sparsity problem in neural grammatical error correction.
Ranked #16 on
Grammatical Error Correction
on BEA-2019 (test)
no code implementations • WS 2019 • Marcin Junczys-Dowmunt
Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models.
no code implementations • EMNLP 2018 • Marcin Junczys-Dowmunt
For each sentence pair of the noisy parallel corpus we compute cross-entropy scores according to two inverse translation models trained on clean data.
no code implementations • WS 2018 • Marcin Junczys-Dowmunt
This paper describes the Microsoft submission to the WMT2018 news translation shared task.
no code implementations • WS 2018 • Marcin Junczys-Dowmunt, Roman Grundkiewicz
This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018.
no code implementations • EMNLP 2018 • Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji
In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly.
no code implementations • WS 2018 • Marcin Junczys-Dowmunt, Kenneth Heafield, Hieu Hoang, Roman Grundkiewicz, Anthony Aue
This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task.
1 code implementation • NAACL 2018 • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, Kenneth Heafield
Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines.
Ranked #1 on
Grammatical Error Correction
on _Restricted_
no code implementations • NAACL 2018 • Roman Grundkiewicz, Marcin Junczys-Dowmunt
We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT).
3 code implementations • ACL 2018 • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch
We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs.
2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou
Machine translation has made rapid advances in recent years.
Ranked #3 on
Machine Translation
on WMT 2017 English-Chinese
no code implementations • IJCNLP 2017 • Marcin Junczys-Dowmunt, Roman Grundkiewicz
In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output.
no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell
We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
4 code implementations • EACL 2017 • Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry, Maria Nădejde
We present Nematus, a toolkit for Neural Machine Translation.
no code implementations • WS 2017 • Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch
Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment.
no code implementations • TACL 2017 • Andr{\'e} F. T. Martins, Marcin Junczys-Dowmunt, Fabio N. Kepler, Ram{\'o}n Astudillo, Chris Hokamp, Roman Grundkiewicz
Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways.
no code implementations • AMTA 2016 • Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt
The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community.
2 code implementations • IWSLT 2016 • Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang
In this paper we provide the largest published comparison of translation quality for phrase-based SMT and neural machine translation across 30 translation directions.
no code implementations • ACL 2016 • Aleš Tamchyna, Alexander Fraser, Ondřej Bojar, Marcin Junczys-Dowmunt
Discriminative translation models utilizing source context have been shown to help statistical machine translation performance.
no code implementations • EMNLP 2016 • Marcin Junczys-Dowmunt, Roman Grundkiewicz
In this work, we study parameter tuning towards the M^2 metric, the standard metric for automatic grammar error correction (GEC) tasks.
1 code implementation • WS 2016 • Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich
For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1. 1 BLEU points and our own phrase-based baseline by 1. 6 BLEU.
no code implementations • WS 2016 • Marcin Junczys-Dowmunt, Roman Grundkiewicz
This paper describes the submission of the AMU (Adam Mickiewicz University) team to the Automatic Post-Editing (APE) task of WMT 2016.
no code implementations • LREC 2016 • Micha{\l} Ziemski, Marcin Junczys-Dowmunt, Bruno Pouliquen
This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator.