Search Results for author: Marcin Junczys-Dowmunt

Found 39 papers, 11 papers with code

The University of Edinburgh’s systems submission to the MT task at IWSLT

no code implementations • IWSLT 2016 • Marcin Junczys-Dowmunt, Alexandra Birch

This paper describes the submission of the University of Edinburgh team to the IWSLT MT task for TED talks.

Domain Adaptation Translation

Paper
Add Code

On-the-Fly Fusion of Large Language Models and Machine Translation

no code implementations • 14 Nov 2023 • Hieu Hoang, Huda Khayrallah, Marcin Junczys-Dowmunt

We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input.

In-Context Learning Machine Translation +2

Paper
Add Code

SOTASTREAM: A Streaming Approach to Machine Translation Training

1 code implementation • 14 Aug 2023 • Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt

Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.

Machine Translation Management +2

Paper
Code

Escaping the sentence-level paradigm in machine translation

no code implementations • 25 Apr 2023 • Matt Post, Marcin Junczys-Dowmunt

It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation.

Machine Translation Sentence +1

Paper
Add Code

The JHU-Microsoft Submission for WMT21 Quality Estimation Shared Task

no code implementations • WMT (EMNLP) 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Christian Federmann, Philipp Koehn

This paper presents the JHU-Microsoft joint submission for WMT 2021 quality estimation shared task.

Data Augmentation Task 2 +1

Paper
Add Code

Levenshtein Training for Word-level Quality Estimation

1 code implementation • EMNLP 2021 • Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn

We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation.

Transfer Learning Translation

Paper
Code

To Ship or Not to Ship: An Extensive Evaluation of Automatic Metrics for Machine Translation

2 code implementations • WMT (EMNLP) 2021 • Tom Kocmi, Christian Federmann, Roman Grundkiewicz, Marcin Junczys-Dowmunt, Hitokazu Matsushita, Arul Menezes

Automatic metrics are commonly used as the exclusive tool for declaring the superiority of one machine translation system's quality over another.

Machine Translation Sentence +1

Paper
Code

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

no code implementations • EACL (HumEval) 2021 • Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments.

Machine Translation Translation

Paper
Add Code

The Curious Case of Hallucinations in Neural Machine Translation

1 code implementation • NAACL 2021 • Vikas Raunak, Arul Menezes, Marcin Junczys-Dowmunt

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies.

Hallucination Knowledge Distillation +3

Paper
Code

Minimally-Augmented Grammatical Error Correction

no code implementations • WS 2019 • Roman Grundkiewicz, Marcin Junczys-Dowmunt

There has been an increased interest in low-resource approaches to automatic grammatical error correction.

Grammatical Error Correction

Paper
Add Code

From Research to Production and Back: Ludicrously Fast Neural Machine Translation

no code implementations • WS 2019 • Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev

Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models.

C++ code Machine Translation +1

Paper
Add Code

Neural Grammatical Error Correction Systems with Unsupervised Pre-training on Synthetic Data

1 code implementation • WS 2019 • Roman Grundkiewicz, Marcin Junczys-Dowmunt, Kenneth Heafield

Considerable effort has been made to address the data sparsity problem in neural grammatical error correction.

Ranked #13 on Grammatical Error Correction on BEA-2019 (test)

Grammatical Error Correction Unsupervised Pre-training

Paper
Code

Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

no code implementations • WS 2019 • Marcin Junczys-Dowmunt

Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models.

Data Augmentation Machine Translation +2

Paper
Add Code

MS-UEdin Submission to the WMT2018 APE Shared Task: Dual-Source Transformer for Automatic Post-Editing

no code implementations • WS 2018 • Marcin Junczys-Dowmunt, Roman Grundkiewicz

This paper describes the Microsoft and University of Edinburgh submission to the Automatic Post-editing shared task at WMT2018.

Automatic Post-Editing NMT

Paper
Add Code

Microsoft's Submission to the WMT2018 News Translation Task: How I Learned to Stop Worrying and Love the Data

no code implementations • WS 2018 • Marcin Junczys-Dowmunt

This paper describes the Microsoft submission to the WMT2018 news translation shared task.

Sentence Translation

Paper
Add Code

Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora

no code implementations • EMNLP 2018 • Marcin Junczys-Dowmunt

For each sentence pair of the noisy parallel corpus we compute cross-entropy scores according to two inverse translation models trained on clean data.

Sentence Translation

Paper
Add Code

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

no code implementations • EMNLP 2018 • Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji

In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly.

Machine Translation Translation

Paper
Add Code

Marian: Cost-effective High-Quality Neural Machine Translation in C++

no code implementations • WS 2018 • Marcin Junczys-Dowmunt, Kenneth Heafield, Hieu Hoang, Roman Grundkiewicz, Anthony Aue

This paper describes the submissions of the "Marian" team to the WNMT 2018 shared task.

Machine Translation Translation +1

Paper
Add Code

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

no code implementations • NAACL 2018 • Roman Grundkiewicz, Marcin Junczys-Dowmunt

We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT).

Ranked #2 on Grammatical Error Correction on CoNLL-2014 Shared Task (10 annotations)

Grammatical Error Correction Machine Translation +2

Paper
Add Code

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

1 code implementation • NAACL 2018 • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, Kenneth Heafield

Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines.

Ranked #1 on Grammatical Error Correction on _Restricted_

Domain Adaptation Grammatical Error Correction +3

Paper
Code

Marian: Fast Neural Machine Translation in C++

2 code implementations • ACL 2018 • Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs.

Machine Translation Translation

1,170

Paper
Code

Achieving Human Parity on Automatic Chinese to English News Translation

2 code implementations • 15 Mar 2018 • Hany Hassan, Anthony Aue, Chang Chen, Vishal Chowdhary, Jonathan Clark, Christian Federmann, Xuedong Huang, Marcin Junczys-Dowmunt, William Lewis, Mu Li, Shujie Liu, Tie-Yan Liu, Renqian Luo, Arul Menezes, Tao Qin, Frank Seide, Xu Tan, Fei Tian, Lijun Wu, Shuangzhi Wu, Yingce Xia, Dong-dong Zhang, Zhirui Zhang, Ming Zhou

Machine translation has made rapid advances in recent years.

Ranked #3 on Machine Translation on WMT 2017 English-Chinese

Machine Translation Translation

Paper
Code

Are we experiencing the Golden Age of Automatic Post-Editing?

no code implementations • WS 2018 • Marcin Junczys-Dowmunt

Automatic Post-Editing

Paper
Add Code

The AMU-UEdin Submission to the WMT 2017 Shared Task on Automatic Post-Editing

no code implementations • WS 2017 • Marcin Junczys-Dowmunt

Automatic Post-Editing

Paper
Add Code

An Exploration of Neural Sequence-to-Sequence Architectures for Automatic Post-Editing

no code implementations • IJCNLP 2017 • Marcin Junczys-Dowmunt, Roman Grundkiewicz

In this work, we explore multiple neural architectures adapted for the task of automatic post-editing of machine translation output.

Automatic Post-Editing Hard Attention +1

Paper
Add Code

The SUMMA Platform Prototype

no code implementations • EACL 2017 • Renars Liepins, Ulrich Germann, Guntis Barzdins, Alex Birch, ra, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herv{\'e} Bourlard, Jo{\~a}o Prieto, Ond{\v{r}}ej Klejch, Peter Bell, Alex Lazaridis, ros, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Mir, Sebasti{\~a}o a, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, Jeff Mitchell

We present the first prototype of the SUMMA Platform: an integrated platform for multilingual media monitoring.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Nematus: a Toolkit for Neural Machine Translation

4 code implementations • EACL 2017 • Rico Sennrich, Orhan Firat, Kyunghyun Cho, Alexandra Birch, Barry Haddow, Julian Hitschler, Marcin Junczys-Dowmunt, Samuel Läubli, Antonio Valerio Miceli Barone, Jozef Mokry, Maria Nădejde

We present Nematus, a toolkit for Neural Machine Translation.

Machine Translation Translation

798

Paper
Code

Predicting Target Language CCG Supertags Improves Neural Machine Translation

no code implementations • WS 2017 • Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch

Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment.

Machine Translation NMT +2

Paper
Add Code

Pushing the Limits of Translation Quality Estimation

no code implementations • TACL 2017 • Andr{\'e} F. T. Martins, Marcin Junczys-Dowmunt, Fabio N. Kepler, Ram{\'o}n Astudillo, Chris Hokamp, Roman Grundkiewicz

Translation quality estimation is a task of growing importance in NLP, due to its potential to reduce post-editing human effort in disruptive ways.

Automatic Post-Editing Sentence +1

Paper
Add Code

Fast, Scalable Phrase-Based SMT Decoding

no code implementations • AMTA 2016 • Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt

The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community.

Machine Translation Translation

Paper
Add Code

Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions

2 code implementations • IWSLT 2016 • Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang

In this paper we provide the largest published comparison of translation quality for phrase-based SMT and neural machine translation across 30 translation directions.

Machine Translation Sentence +1

Paper
Code

Target-Side Context for Discriminative Models in Statistical Machine Translation

no code implementations • ACL 2016 • Aleš Tamchyna, Alexander Fraser, Ondřej Bojar, Marcin Junczys-Dowmunt

Discriminative translation models utilizing source context have been shown to help statistical machine translation performance.

Machine Translation Translation

Paper
Add Code

Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction

no code implementations • EMNLP 2016 • Marcin Junczys-Dowmunt, Roman Grundkiewicz

In this work, we study parameter tuning towards the M^2 metric, the standard metric for automatic grammar error correction (GEC) tasks.

Grammatical Error Correction Machine Translation +1

Paper
Add Code

Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing

no code implementations • WS 2016 • Marcin Junczys-Dowmunt, Roman Grundkiewicz

This paper describes the submission of the AMU (Adam Mickiewicz University) team to the Automatic Post-Editing (APE) task of WMT 2016.

Automatic Post-Editing Translation

Paper
Add Code

The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT

1 code implementation • WS 2016 • Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich

For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1. 1 BLEU points and our own phrase-based baseline by 1. 6 BLEU.

Machine Translation NMT +1

Paper
Code

The United Nations Parallel Corpus v1.0

no code implementations • LREC 2016 • Micha{\l} Ziemski, Marcin Junczys-Dowmunt, Bruno Pouliquen

This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator.

Translation