no code implementations • WMT (EMNLP) 2021 • Raphael Rubino, Atsushi Fujita, Benjamin Marie
This paper presents the NICT Kyoto submission for the WMT’21 Quality Estimation (QE) Critical Error Detection shared task (Task 3).
no code implementations • WMT (EMNLP) 2020 • Raj Dabre, Atsushi Fujita
This paper investigates a combination of SD and TL for training efficient NMT models for ELR settings, where we utilize TL with helping corpora twice: once for distilling the ELR corpora and then during compact model training.
Low Resource Neural Machine Translation Low-Resource Neural Machine Translation +4
no code implementations • WMT (EMNLP) 2020 • Benjamin Marie, Raphael Rubino, Atsushi Fujita
This paper presents neural machine translation systems and their combination built for the WMT20 English-Polish and Japanese->English translation tasks.
no code implementations • MTSummit 2021 • Raj Dabre, Atsushi Fujita
In low-resource scenarios and NMT models tend to perform poorly because the model training quickly converges to a point where the softmax distribution computed using logits approaches the gold label distribution.
1 code implementation • MTSummit 2021 • Atsushi Fujita
Existing approaches for machine translation (MT) mostly translate given text in the source language into the target language and without explicitly referring to information indispensable for producing proper translation.
no code implementations • 9 Nov 2023 • Yuto Kuroda, Atsushi Fujita, Tomoyuki Kajiwara, Takashi Ninomiya
In this paper, we extensively investigate the usefulness of synthetic TQE data and pre-trained multilingual encoders in unsupervised sentence-level TQE, both of which have been proven effective in the supervised training scenarios.
1 code implementation • 7 Nov 2023 • Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, Sadao Kurohashi
To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences.
2 code implementations • ACL 2021 • Benjamin Marie, Atsushi Fujita, Raphael Rubino
MT evaluations in recent papers tend to copy and compare automatic metric scores from previous work to claim the superiority of a method or an algorithm without confirming neither exactly the same training, validating, and testing data have been used nor the metric scores are comparable.
no code implementations • 18 Jun 2021 • Raj Dabre, Atsushi Fujita
Finally, we analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not.
no code implementations • EACL 2021 • Rei Miyata, Atsushi Fujita
Pre-editing is the process of modifying the source text (ST) so that it can be translated by machine translation (MT) in a better quality.
no code implementations • 29 Jan 2021 • Benjamin Marie, Atsushi Fujita
Nonetheless, large monolingual data in the target domains or languages are not always available to generate large synthetic parallel data.
no code implementations • 20 Sep 2020 • Raj Dabre, Atsushi Fujita
Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels.
no code implementations • ACL 2020 • Benjamin Marie, Raphael Rubino, Atsushi Fujita
In this paper, we show that neural machine translation (NMT) systems trained on large back-translated data overfit some of the characteristics of machine-translated texts.
no code implementations • WS 2020 • Raj Dabre, Raphael Rubino, Atsushi Fujita
We propose and evaluate a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding.
1 code implementation • LREC 2020 • Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi
To address this, we examine a language independent framework for parallel corpus mining which is a quick and effective way to mine a parallel corpus from publicly available lectures at Coursera.
no code implementations • WS 2019 • Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
This paper presents the NICT{'}s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks.
no code implementations • IJCNLP 2019 • Raj Dabre, Atsushi Fujita, Chenhui Chu
This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting.
Low Resource Neural Machine Translation Low-Resource Neural Machine Translation +3
no code implementations • 27 Aug 2019 • Raj Dabre, Atsushi Fujita
This paper proposes a novel procedure for training an encoder-decoder based deep neural network which compresses NxM models into a single model enabling us to dynamically choose the number of encoder and decoder layers for decoding.
no code implementations • WS 2019 • Benjamin Marie, Raj Dabre, Atsushi Fujita
Our primary submission to the task is the result of a simple combination of our SMT and NMT systems.
no code implementations • WS 2019 • Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.
no code implementations • WS 2019 • Benjamin Marie, Haipeng Sun, Rui Wang, Kehai Chen, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
This paper presents the NICT{'}s participation in the WMT19 unsupervised news translation task.
1 code implementation • WS 2019 • Aizhan Imankulova, Raj Dabre, Atsushi Fujita, Kenji Imamura
This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking.
no code implementations • ACL 2019 • Benjamin Marie, Atsushi Fujita
State-of-the-art methods for unsupervised bilingual word embeddings (BWE) train a mapping function that maps pre-trained monolingual word embeddings into a bilingual space.
no code implementations • NAACL 2019 • Benjamin Marie, Atsushi Fujita
We propose a new algorithm for extracting from monolingual data what we call partial translations: pairs of source and target sentences that contain sequences of tokens that are translations of each other.
no code implementations • 30 Oct 2018 • Benjamin Marie, Atsushi Fujita
In this work, we propose to define unsupervised NMT (UNMT) as NMT trained with the supervision of synthetic bilingual data.
no code implementations • WS 2018 • Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
Our systems are ranked first for the Estonian-English and Finnish-English language pairs (constraint) according to BLEU-cased.
no code implementations • 14 Jul 2018 • Raj Dabre, Atsushi Fujita
In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder.
no code implementations • WS 2018 • Kenji Imamura, Atsushi Fujita, Eiichiro Sumita
A large-scale parallel corpus is required to train encoder-decoder neural machine translation.
no code implementations • IJCNLP 2017 • Tomoyuki Kajiwara, Atsushi Fujita
This paper examines the usefulness of semantic features based on word alignments for estimating the quality of text simplification.
no code implementations • WS 2017 • Atsushi Fujita, Eiichiro Sumita
Aiming at facilitating the research on quality estimation (QE) and automatic post-editing (APE) of machine translation (MT) outputs, especially for those among Asian languages, we have created new datasets for Japanese to English, Chinese, and Korean translations.
no code implementations • ACL 2017 • Benjamin Marie, Atsushi Fujita
We propose a new method for extracting pseudo-parallel sentences from a pair of large monolingual corpora, without relying on any document-level information.
no code implementations • WS 2017 • Atsushi Fujita, Kikuko Tanabe, Chiho Toyoshima, Mayuka Yamamoto, Kyo Kageura, Anthony Hartley
This paper also describes an application of our scheme to an English-to-Japanese translation exercise course for undergraduate students at a university in Japan.
no code implementations • TACL 2017 • Benjamin Marie, Atsushi Fujita
We present a new framework to induce an in-domain phrase table from in-domain monolingual data that can be used to adapt a general-domain statistical machine translation system to the targeted domain.