Search Results for author: Marcin Junczys-Dowmunt

Found 39 papers, 12 papers with code

On-the-Fly Fusion of Large Language Models and Machine Translation

no code implementations14 Nov 2023 Hieu Hoang, Huda Khayrallah, Marcin Junczys-Dowmunt

We propose the on-the-fly ensembling of a machine translation model with an LLM, prompted on the same task and input.

In-Context Learning Machine Translation +2

SOTASTREAM: A Streaming Approach to Machine Translation Training

1 code implementation14 Aug 2023 Matt Post, Thamme Gowda, Roman Grundkiewicz, Huda Khayrallah, Rohit Jain, Marcin Junczys-Dowmunt

Many machine translation toolkits make use of a data preparation step wherein raw data is transformed into a tensor format that can be used directly by the trainer.

Machine Translation Management +2

Escaping the sentence-level paradigm in machine translation

1 code implementation25 Apr 2023 Matt Post, Marcin Junczys-Dowmunt

It is well-known that document context is vital for resolving a range of translation ambiguities, and in fact the document setting is the most natural setting for nearly all translation.

Machine Translation Sentence +1

Levenshtein Training for Word-level Quality Estimation

1 code implementation EMNLP 2021 Shuoyang Ding, Marcin Junczys-Dowmunt, Matt Post, Philipp Koehn

We propose a novel scheme to use the Levenshtein Transformer to perform the task of word-level quality estimation.

Transfer Learning Translation

On User Interfaces for Large-Scale Document-Level Human Evaluation of Machine Translation Outputs

no code implementations EACL (HumEval) 2021 Roman Grundkiewicz, Marcin Junczys-Dowmunt, Christian Federmann, Tom Kocmi

Recent studies emphasize the need of document context in human evaluation of machine translations, but little research has been done on the impact of user interfaces on annotator productivity and the reliability of assessments.

Machine Translation Translation

The Curious Case of Hallucinations in Neural Machine Translation

1 code implementation NAACL 2021 Vikas Raunak, Arul Menezes, Marcin Junczys-Dowmunt

In this work, we study hallucinations in Neural Machine Translation (NMT), which lie at an extreme end on the spectrum of NMT pathologies.

Hallucination Knowledge Distillation +3

From Research to Production and Back: Ludicrously Fast Neural Machine Translation

no code implementations WS 2019 Young Jin Kim, Marcin Junczys-Dowmunt, Hany Hassan, Alham Fikri Aji, Kenneth Heafield, Roman Grundkiewicz, Nikolay Bogoychev

Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models.

C++ code Decoder +2

Minimally-Augmented Grammatical Error Correction

no code implementations WS 2019 Roman Grundkiewicz, Marcin Junczys-Dowmunt

There has been an increased interest in low-resource approaches to automatic grammatical error correction.

Grammatical Error Correction

Microsoft Translator at WMT 2019: Towards Large-Scale Document-Level Neural Machine Translation

no code implementations WS 2019 Marcin Junczys-Dowmunt

Using document boundaries present in the authentic and synthetic parallel data, we create sequences of up to 1000 subword segments and train transformer translation models.

Data Augmentation Machine Translation +2

Dual Conditional Cross-Entropy Filtering of Noisy Parallel Corpora

no code implementations EMNLP 2018 Marcin Junczys-Dowmunt

For each sentence pair of the noisy parallel corpus we compute cross-entropy scores according to two inverse translation models trained on clean data.

Sentence Translation

Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation

no code implementations EMNLP 2018 Nikolay Bogoychev, Marcin Junczys-Dowmunt, Kenneth Heafield, Alham Fikri Aji

In order to extract the best possible performance from asynchronous stochastic gradient descent one must increase the mini-batch size and scale the learning rate accordingly.

Machine Translation Translation

Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task

1 code implementation NAACL 2018 Marcin Junczys-Dowmunt, Roman Grundkiewicz, Shubha Guha, Kenneth Heafield

Previously, neural methods in grammatical error correction (GEC) did not reach state-of-the-art results compared to phrase-based statistical machine translation (SMT) baselines.

Domain Adaptation Grammatical Error Correction +3

Near Human-Level Performance in Grammatical Error Correction with Hybrid Machine Translation

no code implementations NAACL 2018 Roman Grundkiewicz, Marcin Junczys-Dowmunt

We combine two of the most popular approaches to automated Grammatical Error Correction (GEC): GEC based on Statistical Machine Translation (SMT) and GEC based on Neural Machine Translation (NMT).

Grammatical Error Correction Machine Translation +2

Marian: Fast Neural Machine Translation in C++

2 code implementations ACL 2018 Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs.

Decoder Machine Translation +1

Predicting Target Language CCG Supertags Improves Neural Machine Translation

no code implementations WS 2017 Maria Nadejde, Siva Reddy, Rico Sennrich, Tomasz Dwojak, Marcin Junczys-Dowmunt, Philipp Koehn, Alexandra Birch

Our results on WMT data show that explicitly modeling target-syntax improves machine translation quality for German->English, a high-resource pair, and for Romanian->English, a low-resource pair and also several syntactic phenomena including prepositional phrase attachment.

Decoder Machine Translation +3

Fast, Scalable Phrase-Based SMT Decoding

no code implementations AMTA 2016 Hieu Hoang, Nikolay Bogoychev, Lane Schwartz, Marcin Junczys-Dowmunt

The utilization of statistical machine translation (SMT) has grown enormously over the last decade, many using open-source software developed by the NLP community.

Decoder Machine Translation +1

Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions

2 code implementations IWSLT 2016 Marcin Junczys-Dowmunt, Tomasz Dwojak, Hieu Hoang

In this paper we provide the largest published comparison of translation quality for phrase-based SMT and neural machine translation across 30 translation directions.

Decoder Machine Translation +2

The AMU-UEDIN Submission to the WMT16 News Translation Task: Attention-based NMT Models as Feature Functions in Phrase-based SMT

1 code implementation WS 2016 Marcin Junczys-Dowmunt, Tomasz Dwojak, Rico Sennrich

For the Russian-English task, our submission achieves the top BLEU result, outperforming the best pure neural system by 1. 1 BLEU points and our own phrase-based baseline by 1. 6 BLEU.

Machine Translation NMT +1

The United Nations Parallel Corpus v1.0

no code implementations LREC 2016 Micha{\l} Ziemski, Marcin Junczys-Dowmunt, Bruno Pouliquen

This paper describes the creation process and statistics of the official United Nations Parallel Corpus, the first parallel corpus composed from United Nations documents published by the original data creator.

Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.