no code implementations • AACL (WAT) 2020 • Raj Dabre, Abhisek Chakrabarty
In this paper we describe our team‘s (NICT-5) Neural Machine Translation (NMT) models whose translations were submitted to shared tasks of the 7th Workshop on Asian Translation.
no code implementations • ACL (WAT) 2021 • Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi
This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021).
no code implementations • WAT 2022 • Raj Dabre
However, to our surprise, we find that existing multilingual NMT systems are able to handle the translation of text annotated with XML tags without any explicit training on data containing said tags.
no code implementations • WAT 2022 • Toshiaki Nakazawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, Shantipriya Parida, Anoop Kunchukuttan, Makoto Morishita, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi
This paper presents the results of the shared tasks from the 9th workshop on Asian translation (WAT2022).
no code implementations • COLING 2022 • Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Eiichiro Sumita
In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART).
no code implementations • ACL (WAT) 2021 • Raj Dabre, Abhisek Chakrabarty
The objective of the task was to explore the utility of multilingual approaches using a variety of in-domain and out-of-domain parallel and monolingual corpora.
no code implementations • MTSummit 2021 • Raj Dabre, Atsushi Fujita
In low-resource scenarios and NMT models tend to perform poorly because the model training quickly converges to a point where the softmax distribution computed using logits approaches the gold label distribution.
no code implementations • MTSummit 2021 • Raj Dabre, Aizhan Imankulova, Masahiro Kaneko
To this end and in this paper and we propose wait-k simultaneous document-level NMT where we keep the context encoder as it is and replace the source sentence encoder and target language decoder with their wait-k equivalents.
no code implementations • IWSLT 2017 • Raj Dabre, Fabien Cromieres, Sadao Kurohashi
We describe here our Machine Translation (MT) model and the results we obtained for the IWSLT 2017 Multilingual Shared Task.
no code implementations • GWC 2016 • Diptesh Kanojia, Raj Dabre, Pushpak Bhattacharyya
India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource.
no code implementations • AACL (WAT) 2020 • Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi
This paper presents the results of the shared tasks from the 7th workshop on Asian translation (WAT2020).
no code implementations • WMT (EMNLP) 2020 • Raj Dabre, Atsushi Fujita
This paper investigates a combination of SD and TL for training efficient NMT models for ELR settings, where we utilize TL with helping corpora twice: once for distilling the ELR corpora and then during compact model training.
no code implementations • 20 Dec 2022 • Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra
In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics.
no code implementations • 16 Nov 2022 • Dominik Macháček, Ondřej Bojar, Raj Dabre
Our studies reveal that the offline MT metrics correlate with CR and can be reliably used for evaluating machine translation in the simultaneous mode, with some limitations on the test set size.
no code implementations • 6 Jun 2022 • Raj Dabre, Aneerav Sukhoo
In this paper, we describe MorisienMT, a dataset for benchmarking machine translation quality of Mauritian Creole.
no code implementations • Findings (NAACL) 2022 • Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi
Meanwhile, the contrastive objective can implicitly utilize automatically learned word alignment, which has not been explored in many-to-many NMT.
no code implementations • 11 Apr 2022 • Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao
This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD).
no code implementations • 10 Mar 2022 • Aman Kumar, Himani Shrotriya, Prachi Sahu, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Amogh Mishra, Mitesh M. Khapra, Pratyush Kumar
Natural Language Generation (NLG) for non-English languages is hampered by the scarcity of datasets in these languages.
1 code implementation • COLING 2020 • Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari, Malhar Kulkarni
We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task.
Cross-Lingual Information Retrieval
Cross-Lingual Word Embeddings
+5
1 code implementation • Findings (ACL) 2022 • Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra, Pratyush Kumar
We present IndicBART, a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages and English.
no code implementations • 25 Aug 2021 • Raj Dabre, Eiichiro Sumita
In this paper we present our open-source neural machine translation (NMT) toolkit called "Yet Another Neural Machine Translation Toolkit" abbreviated as YANMTT which is built on top of the Transformers library.
no code implementations • 18 Jun 2021 • Raj Dabre, Atsushi Fujita
Finally, we analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not.
no code implementations • 15 Apr 2021 • Raj Dabre, Aizhan Imankulova, Masahiro Kaneko, Abhisek Chakrabarty
Parallel corpora are indispensable for training neural machine translation (NMT) models, and parallel corpora for most language pairs do not exist or are scarce.
no code implementations • COLING 2020 • Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
The advent of neural machine translation (NMT) has opened up exciting research in building multilingual translation systems i. e. translation models that can handle more than one language pair.
no code implementations • COLING 2020 • Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita
In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data.
no code implementations • 20 Sep 2020 • Raj Dabre, Atsushi Fujita
Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels.
no code implementations • ACL 2020 • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita
Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks.
1 code implementation • LREC 2020 • Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song, Sadao Kurohashi
Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora.
no code implementations • WS 2020 • Raj Dabre, Raphael Rubino, Atsushi Fujita
We propose and evaluate a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding.
no code implementations • 23 Jan 2020 • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita
To this end, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI.
no code implementations • 4 Jan 2020 • Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.
1 code implementation • LREC 2020 • Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi
To address this, we examine a language independent framework for parallel corpus mining which is a quick and effective way to mine a parallel corpus from publicly available lectures at Coursera.
no code implementations • WS 2019 • Raj Dabre, Eiichiro Sumita
In this paper we describe our submissions to WAT 2019 for the following tasks: English{--}Tamil translation and Russian{--}Japanese translation.
no code implementations • WS 2019 • Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Yusuke Oda, Shantipriya Parida, Ond{\v{r}}ej Bojar, Sadao Kurohashi
This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task.
no code implementations • IJCNLP 2019 • Raj Dabre, Atsushi Fujita, Chenhui Chu
This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting.
no code implementations • 27 Aug 2019 • Raj Dabre, Atsushi Fujita
This paper proposes a novel procedure for training an encoder-decoder based deep neural network which compresses NxM models into a single model enabling us to dynamically choose the number of encoder and decoder layers for decoding.
no code implementations • WS 2019 • Raj Dabre, Eiichiro Sumita
al., 2017) to improve translation quality for Japanese↔English.
no code implementations • WS 2019 • Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita
In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.
no code implementations • WS 2019 • Benjamin Marie, Raj Dabre, Atsushi Fujita
Our primary submission to the task is the result of a simple combination of our SMT and NMT systems.
1 code implementation • WS 2019 • Aizhan Imankulova, Raj Dabre, Atsushi Fujita, Kenji Imamura
This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking.
no code implementations • 19 Jun 2019 • Chenhui Chu, Raj Dabre
In this paper, we propose two novel methods for domain adaptation for the attention-only neural machine translation (NMT) model, i. e., the Transformer.
no code implementations • 14 May 2019 • Raj Dabre, Chenhui Chu, Anoop Kunchukuttan
We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.
no code implementations • 14 Jul 2018 • Raj Dabre, Atsushi Fujita
In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder.
no code implementations • IJCNLP 2017 • Fabien Cromieres, Toshiaki Nakazawa, Raj Dabre
Machine Translation (MT) is a sub-field of NLP which has experienced a number of paradigm shifts since its inception.
1 code implementation • WS 2017 • Fabien Cromieres, Raj Dabre, Toshiaki Nakazawa, Sadao Kurohashi
We describe here our approaches and results on the WAT 2017 shared translation tasks.
1 code implementation • 3 Oct 2017 • Raj Dabre, Sadao Kurohashi
Multilinguality is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages help improve the results in many Natural Language Processing tasks.
no code implementations • ACL 2017 • Chenhui Chu, Raj Dabre, Sadao Kurohashi
In this paper, we propose a novel domain adaptation method named {``}mixed fine tuning{''} for neural machine translation (NMT).
no code implementations • MTSummit 2017 • Raj Dabre, Fabien Cromieres, Sadao Kurohashi
In this paper, we explore a simple solution to "Multi-Source Neural Machine Translation" (MSNMT) which only relies on preprocessing a N-way multilingual corpus without modifying the Neural Machine Translation (NMT) architecture or training procedure.
no code implementations • 12 Jan 2017 • Chenhui Chu, Raj Dabre, Sadao Kurohashi
In this paper, we propose a novel domain adaptation method named "mixed fine tuning" for neural machine translation (NMT).
no code implementations • LREC 2016 • Chenhui Chu, Raj Dabre, Sadao Kurohashi
Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains.