no code implementations • WMT (EMNLP) 2021 • Raphael Rubino, Atsushi Fujita, Benjamin Marie
This paper presents the NICT Kyoto submission for the WMT’21 Quality Estimation (QE) Critical Error Detection shared task (Task 3).
no code implementations • WMT (EMNLP) 2020 • Benjamin Marie, Raphael Rubino, Atsushi Fujita
This paper presents neural machine translation systems and their combination built for the WMT20 English-Polish and Japanese->English translation tasks.
1 code implementation • 29 Jul 2024 • Tom Kocmi, Eleftherios Avramidis, Rachel Bawden, Ondrej Bojar, Anton Dvorkovich, Christian Federmann, Mark Fishel, Markus Freitag, Thamme Gowda, Roman Grundkiewicz, Barry Haddow, Marzena Karpinska, Philipp Koehn, Benjamin Marie, Kenton Murray, Masaaki Nagata, Martin Popel, Maja Popovic, Mariya Shmatova, Steinþór Steingrímsson, Vilém Zouhar
This is the preliminary ranking of WMT24 General MT systems based on automatic metrics.
no code implementations • 28 Sep 2022 • Benjamin Marie
This report presents an automatic evaluation of the general machine translation task of the Seventh Conference on Machine Translation (WMT22).
2 code implementations • ACL 2021 • Benjamin Marie, Atsushi Fujita, Raphael Rubino
MT evaluations in recent papers tend to copy and compare automatic metric scores from previous work to claim the superiority of a method or an algorithm without confirming neither exactly the same training, validating, and testing data have been used nor the metric scores are comparable.
no code implementations • 29 Jan 2021 • Benjamin Marie, Atsushi Fujita
Nonetheless, large monolingual data in the target domains or languages are not always available to generate large synthetic parallel data.
no code implementations • ACL 2020 • Benjamin Marie, Raphael Rubino, Atsushi Fujita
In this paper, we show that neural machine translation (NMT) systems trained on large back-translated data overfit some of the characteristics of machine-translated texts.
no code implementations • WS 2019 • Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
This paper presents the NICT{'}s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks.
no code implementations • WS 2019 • Benjamin Marie, Haipeng Sun, Rui Wang, Kehai Chen, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
This paper presents the NICT{'}s participation in the WMT19 unsupervised news translation task.
no code implementations • WS 2019 • Benjamin Marie, Raj Dabre, Atsushi Fujita
Our primary submission to the task is the result of a simple combination of our SMT and NMT systems.
no code implementations • WS 2019 • Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.
no code implementations • ACL 2019 • Benjamin Marie, Atsushi Fujita
State-of-the-art methods for unsupervised bilingual word embeddings (BWE) train a mapping function that maps pre-trained monolingual word embeddings into a bilingual space.
no code implementations • NAACL 2019 • Benjamin Marie, Atsushi Fujita
We propose a new algorithm for extracting from monolingual data what we call partial translations: pairs of source and target sentences that contain sequences of tokens that are translations of each other.
no code implementations • 30 Oct 2018 • Benjamin Marie, Atsushi Fujita
In this work, we propose to define unsupervised NMT (UNMT) as NMT trained with the supervision of synthetic bilingual data.
no code implementations • WS 2018 • Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
Our systems are ranked first for the Estonian-English and Finnish-English language pairs (constraint) according to BLEU-cased.
no code implementations • WS 2018 • Rui Wang, Benjamin Marie, Masao Utiyama, Eiichiro Sumita
Using the clean data of the WMT18 shared news translation task, we designed several features and trained a classifier to score each sentence pairs in the noisy data.
no code implementations • ACL 2017 • Benjamin Marie, Atsushi Fujita
We propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information.
no code implementations • TACL 2017 • Benjamin Marie, Atsushi Fujita
We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain.