Search Results for author: Raj Dabre

In this paper, we explore a simple solution to "Multi-Source Neural Machine Translation" (MSNMT) which only relies on preprocessing a N-way multilingual corpus without modifying the Neural Machine Translation (NMT) architecture or training procedure.

Machine Translation NMT +2

Paper
Add Code

An Empirical Comparison of Domain Adaptation Methods for Neural Machine Translation

no code implementations • ACL 2017 • Chenhui Chu, Raj Dabre, Sadao Kurohashi

In this paper, we propose a novel domain adaptation method named {``}mixed fine tuning{''} for neural machine translation (NMT).

Domain Adaptation Machine Translation +2

Paper
Add Code

MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing

1 code implementation • 3 Oct 2017 • Raj Dabre, Sadao Kurohashi

Multilinguality is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages help improve the results in many Natural Language Processing tasks.

Machine Translation Multilingual NLP +3

Paper
Code

Kyoto University Participation to WAT 2017

1 code implementation • WS 2017 • Fabien Cromieres, Raj Dabre, Toshiaki Nakazawa, Sadao Kurohashi

We describe here our approaches and results on the WAT 2017 shared translation tasks.

Language Modelling Machine Translation +1

Paper
Code

An Empirical Study of Language Relatedness for Transfer Learning in Neural Machine Translation

no code implementations • PACLIC 2017 • Raj Dabre, Tetsuji Nakagawa, Hideto Kazawa

Machine Translation Transfer Learning +1

Paper
Add Code

Neural Machine Translation: Basics, Practical Aspects and Recent Trends

no code implementations • IJCNLP 2017 • Fabien Cromieres, Toshiaki Nakazawa, Raj Dabre

Machine Translation (MT) is a sub-field of NLP which has experienced a number of paradigm shifts since its inception.

Image Captioning Machine Translation +2

Paper
Add Code

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

no code implementations • 14 Jul 2018 • Raj Dabre, Atsushi Fujita

In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder.

Machine Translation NMT +1

Paper
Add Code

A Brief Survey of Multilingual Neural Machine Translation

no code implementations • 14 May 2019 • Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.

Machine Translation Transfer Learning +1

Paper
Add Code

Multilingual Multi-Domain Adaptation Approaches for Neural Machine Translation

no code implementations • 19 Jun 2019 • Chenhui Chu, Raj Dabre

In this paper, we propose two novel methods for domain adaptation for the attention-only neural machine translation (NMT) model, i. e., the Transformer.

Domain Adaptation Machine Translation +2

Paper
Add Code

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

1 code implementation • WS 2019 • Aizhan Imankulova, Raj Dabre, Atsushi Fujita, Kenji Imamura

This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking.

Benchmarking Domain Adaptation +4

Paper
Code

NICT's Machine Translation Systems for the WMT19 Similar Language Translation Task

no code implementations • WS 2019 • Benjamin Marie, Raj Dabre, Atsushi Fujita

Our primary submission to the task is the result of a simple combination of our SMT and NMT systems.

Machine Translation NMT +1

Paper
Add Code

NICT's Supervised Neural Machine Translation Systems for the WMT19 News Translation Task

no code implementations • WS 2019 • Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.

Machine Translation NMT +2

Paper
Add Code

NICT's Supervised Neural Machine Translation Systems for the WMT19 Translation Robustness Task

no code implementations • WS 2019 • Raj Dabre, Eiichiro Sumita

al., 2017) to improve translation quality for Japanese↔English.

Domain Adaptation Machine Translation +3

Paper
Add Code

Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers

no code implementations • 27 Aug 2019 • Raj Dabre, Atsushi Fujita

This paper proposes a novel procedure for training an encoder-decoder based deep neural network which compresses NxM models into a single model enabling us to dynamically choose the number of encoder and decoder layers for decoding.

Machine Translation Translation

Paper
Add Code

NICT's participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT

no code implementations • WS 2019 • Raj Dabre, Eiichiro Sumita

In this paper we describe our submissions to WAT 2019 for the following tasks: English{--}Tamil translation and Russian{--}Japanese translation.

Domain Adaptation NMT +1

Paper
Add Code

Overview of the 6th Workshop on Asian Translation

no code implementations • WS 2019 • Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Yusuke Oda, Shantipriya Parida, Ond{\v{r}}ej Bojar, Sadao Kurohashi

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task.

Translation

Paper
Add Code

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

no code implementations • IJCNLP 2019 • Raj Dabre, Atsushi Fujita, Chenhui Chu

This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting.

Low-Resource Neural Machine Translation NMT +2

Paper
Add Code

Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation

1 code implementation • LREC 2020 • Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi

To address this, we examine a language independent framework for parallel corpus mining which is a quick and effective way to mine a parallel corpus from publicly available lectures at Coursera.

Benchmarking Domain Adaptation +4

Paper
Code

A Comprehensive Survey of Multilingual Neural Machine Translation

no code implementations • 4 Jan 2020 • Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.

Machine Translation NMT +2

Paper
Add Code

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

no code implementations • 23 Jan 2020 • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

To this end, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI.

Machine Translation NMT +1

Paper
Add Code

Balancing Cost and Benefit with Tied-Multi Transformers

no code implementations • WS 2020 • Raj Dabre, Raphael Rubino, Atsushi Fujita

We propose and evaluate a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding.

Knowledge Distillation Machine Translation +2

Paper
Add Code

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

1 code implementation • LREC 2020 • Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song, Sadao Kurohashi

Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora.

Machine Translation NMT +2

Paper
Code

Pre-training via Leveraging Assisting Languages for Neural Machine Translation

no code implementations • ACL 2020 • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks.

Machine Translation NMT +1

Paper
Add Code

Softmax Tempering for Training Neural Machine Translation Models

no code implementations • 20 Sep 2020 • Raj Dabre, Atsushi Fujita

Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels.

Machine Translation NMT +1

Paper
Add Code

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

no code implementations • COLING 2020 • Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data.

Machine Translation NMT +1

Paper
Add Code

Multilingual Neural Machine Translation

no code implementations • COLING 2020 • Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

The advent of neural machine translation (NMT) has opened up exciting research in building multilingual translation systems i. e. translation models that can handle more than one language pair.

Machine Translation NMT +2

Paper
Add Code

Simultaneous Multi-Pivot Neural Machine Translation

no code implementations • 15 Apr 2021 • Raj Dabre, Aizhan Imankulova, Masahiro Kaneko, Abhisek Chakrabarty

Parallel corpora are indispensable for training neural machine translation (NMT) models, and parallel corpora for most language pairs do not exist or are scarce.

Machine Translation NMT +1

Paper
Add Code

Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation

no code implementations • 18 Jun 2021 • Raj Dabre, Atsushi Fujita

Finally, we analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not.

Knowledge Distillation Machine Translation +3

Paper
Add Code

YANMTT: Yet Another Neural Machine Translation Toolkit

no code implementations • 25 Aug 2021 • Raj Dabre, Eiichiro Sumita

In this paper we present our open-source neural machine translation (NMT) toolkit called "Yet Another Neural Machine Translation Toolkit" abbreviated as YANMTT which is built on top of the Transformers library.

Machine Translation Model Compression +3

Paper
Add Code

IndicBART: A Pre-trained Model for Indic Natural Language Generation

1 code implementation • Findings (ACL) 2022 • Raj Dabre, Himani Shrotriya, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra, Pratyush Kumar

We present IndicBART, a multilingual, sequence-to-sequence pre-trained model focusing on 11 Indic languages and English.

Extreme Summarization Machine Translation +4

Paper
Code

Harnessing Cross-lingual Features to Improve Cognate Detection for Low-resource Languages

1 code implementation • COLING 2020 • Diptesh Kanojia, Raj Dabre, Shubham Dewangan, Pushpak Bhattacharyya, Gholamreza Haffari, Malhar Kulkarni

We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task.

Cross-Lingual Information Retrieval Cross-Lingual Word Embeddings +5

Paper
Code

IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages

no code implementations • 10 Mar 2022 • Aman Kumar, Himani Shrotriya, Prachi Sahu, Raj Dabre, Ratish Puduppully, Anoop Kunchukuttan, Amogh Mishra, Mitesh M. Khapra, Pratyush Kumar

Natural Language Generation (NLG) for non-English languages is hampered by the scarcity of datasets in these languages.

Benchmarking Headline Generation +6

Paper
Add Code

Fusion of Self-supervised Learned Models for MOS Prediction

no code implementations • 11 Apr 2022 • Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD).

Paper
Add Code

When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?

no code implementations • Findings (NAACL) 2022 • Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi

Meanwhile, the contrastive objective can implicitly utilize automatically learned word alignment, which has not been explored in many-to-many NMT.

Machine Translation NMT +4

Paper
Add Code

MorisienMT: A Dataset for Mauritian Creole Machine Translation

no code implementations • 6 Jun 2022 • Raj Dabre, Aneerav Sukhoo

In this paper, we describe MorisienMT, a dataset for benchmarking machine translation quality of Mauritian Creole.

Benchmarking Machine Translation +2

Paper
Add Code

MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

1 code implementation • 16 Nov 2022 • Dominik Macháček, Ondřej Bojar, Raj Dabre

There have been several meta-evaluation studies on the correlation between human ratings and offline machine translation (MT) evaluation metrics such as BLEU, chrF2, BertScore and COMET.

Machine Translation Translation

Paper
Code

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

1 code implementation • 20 Dec 2022 • Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics.

Machine Translation

Paper
Code

An Empirical Study of Leveraging Knowledge Distillation for Compressing Multilingual Neural Machine Translation Models

no code implementations • 19 Apr 2023 • Varun Gumma, Raj Dabre, Pratyush Kumar

Knowledge distillation (KD) is a well-known method for compressing neural models.

Knowledge Distillation Machine Translation +1

Paper
Add Code

A Comprehensive Analysis of Adapter Efficiency

2 code implementations • 12 May 2023 • Nandini Mundra, Sumanth Doddapaneni, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra

However, adapters have not been sufficiently analyzed to understand if PEFT translates to benefits in training/deployment efficiency and maintainability/extensibility.

Natural Language Understanding

Paper
Code

Exploring the Impact of Layer Normalization for Zero-shot Neural Machine Translation

no code implementations • 16 May 2023 • Zhuoyuan Mao, Raj Dabre, Qianying Liu, Haiyue Song, Chenhui Chu, Sadao Kurohashi

This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation (ZST).

Machine Translation

Paper
Add Code

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

no code implementations • 17 May 2023 • Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation.

Machine Translation Translation

Paper
Add Code

Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models

1 code implementation • 22 May 2023 • Ratish Puduppully, Anoop Kunchukuttan, Raj Dabre, Ai Ti Aw, Nancy F. Chen

This study investigates machine translation between related languages i. e., languages within the same family that share linguistic characteristics such as word order and lexical similarity.

Machine Translation Translation

Paper
Code

CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation

1 code implementation • 23 May 2023 • Aswanth Kumar, Ratish Puduppully, Raj Dabre, Anoop Kunchukuttan

We learn a regression model, CTQ Scorer (Contextual Translation Quality), that selects examples based on multiple features in order to maximize the translation quality.

In-Context Learning Machine Translation +2

Paper
Code

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

2 code implementations • 25 May 2023 • Jay Gala, Pranjal A. Chitale, Raghavan AK, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M. Khapra, Raj Dabre, Anoop Kunchukuttan

Prior to this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India.

Machine Translation Sentence +1

174

Paper
Code

Robustness of Multi-Source MT to Transcription Errors

no code implementations • 26 May 2023 • Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre

Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling.

Machine Translation speech-recognition +2

Paper
Add Code

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

1 code implementation • 6 Jun 2023 • Zhishen Yang, Raj Dabre, Hideki Tanaka, Naoaki Okazaki

Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings.

Caption Generation Image Captioning +1

Paper
Code

Turning Whisper into Real-Time Transcription System

1 code implementation • 27 Jul 2023 • Dominik Macháček, Raj Dabre, Ondřej Bojar

Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription.

speech-recognition Speech Recognition +1

1,032

Paper
Code

SelfSeg: A Self-supervised Sub-word Segmentation Method for Neural Machine Translation

no code implementations • 31 Jul 2023 • Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi, Eiichiro Sumita

Sub-word segmentation is an essential pre-processing step for Neural Machine Translation (NMT).

Machine Translation NMT +1

Paper
Add Code

CreoleVal: Multilingual Multitask Benchmarks for Creoles

1 code implementation • 30 Oct 2023 • Heather Lent, Kushal Tatariya, Raj Dabre, Yiyi Chen, Marcell Fekete, Esther Ploeger, Li Zhou, Hans Erik Heje, Diptesh Kanojia, Paul Belony, Marcel Bollmann, Loïc Grobol, Miryam de Lhoneux, Daniel Hershcovich, Michel DeGraff, Anders Søgaard, Johannes Bjerva

Creoles represent an under-explored and marginalized group of languages, with few available resources for NLP research.

Machine Translation Reading Comprehension +2

Paper
Code

Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts

1 code implementation • 7 Nov 2023 • Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, Sadao Kurohashi

To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences.

Benchmarking Machine Translation +3

Paper
Code

Natural Language Processing for Dialects of a Language: A Survey

no code implementations • 11 Jan 2024 • Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, Doris Dippold

Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.

Attribute Machine Translation +4

Paper
Add Code

PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities

no code implementations • 13 Jan 2024 • Settaluri Lakshmi Sravanthi, Meet Doshi, Tankala Pavan Kalyan, Rudra Murthy, Pushpak Bhattacharyya, Raj Dabre

To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis.

Instruction Following Multiple-choice

Paper
Add Code

An Empirical Study of In-context Learning in LLMs for Machine Translation

no code implementations • 22 Jan 2024 • Pranjal A. Chitale, Jay Gala, Raj Dabre

While we establish the significance of the quality of the target distribution over the source distribution of demonstrations, we further observe that perturbations sometimes act as regularizers, resulting in performance improvements.

In-Context Learning Machine Translation +2

Paper
Add Code

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

no code implementations • 24 Jan 2024 • Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion.

FAD

Paper
Add Code

RomanSetu: Efficiently unlocking multilingual capabilities of Large Language Models models via Romanization

no code implementations • 25 Jan 2024 • Jaavid Aktar Husain, Raj Dabre, Aswanth Kumar, Jay Gala, Thanmay Jayakumar, Ratish Puduppully, Anoop Kunchukuttan

This study addresses the challenge of extending Large Language Models (LLMs) to non-English languages using non-Roman scripts.

Continual Pretraining Sentiment Analysis

Paper
Add Code

Airavata: Introducing Hindi Instruction-tuned LLM

1 code implementation • 26 Jan 2024 • Jay Gala, Thanmay Jayakumar, Jaavid Aktar Husain, Aswanth Kumar M, Mohammed Safi Ur Rahman Khan, Diptesh Kanojia, Ratish Puduppully, Mitesh M. Khapra, Raj Dabre, Rudra Murthy, Anoop Kunchukuttan

We announce the initial release of "Airavata," an instruction-tuned LLM for Hindi.

Paper
Code

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

1 code implementation • 11 Mar 2024 • Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages.

Paper
Code

Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese

no code implementations • 20 Mar 2024 • Meet Doshi, Raj Dabre, Pushpak Bhattacharyya

In this paper, we explore the utility of Translationese as synthetic data created using machine translation for pre-training language models (LMs).

Machine Translation Natural Language Understanding

Paper
Add Code

A Morphology-Based Investigation of Positional Encodings

no code implementations • 6 Apr 2024 • Poulami Ghosh, Shikhar Vashishth, Raj Dabre, Pushpak Bhattacharyya

How does the importance of positional encoding in pre-trained language models (PLMs) vary across languages with different morphological complexity?

Dependency Parsing named-entity-recognition +3

Paper
Add Code

Sophisticated Lexical Databases - Simplified Usage: Mobile Applications and Browser Plugins For Wordnets

no code implementations • GWC 2016 • Diptesh Kanojia, Raj Dabre, Pushpak Bhattacharyya

India is a country with 22 officially recognized languages and 17 of these have WordNets, a crucial resource.

Paper
Add Code

NICT-5’s Submission To WAT 2021: MBART Pre-training And In-Domain Fine Tuning For Indic Languages

no code implementations • ACL (WAT) 2021 • Raj Dabre, Abhisek Chakrabarty

The objective of the task was to explore the utility of multilingual approaches using a variety of in-domain and out-of-domain parallel and monolingual corpora.

NMT Translation

Paper
Add Code

Overview of the 8th Workshop on Asian Translation

no code implementations • ACL (WAT) 2021 • Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi

This paper presents the results of the shared tasks from the 8th workshop on Asian translation (WAT2021).

Translation

Paper
Add Code

NICT’s Submission to the WAT 2022 Structured Document Translation Task

no code implementations • WAT 2022 • Raj Dabre

However, to our surprise, we find that existing multilingual NMT systems are able to handle the translation of text annotated with XML tags without any explicit training on data containing said tags.

Document Translation NMT +3

Paper
Add Code

NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers

no code implementations • PACLIC 2018 • Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, Eiichiro Sumita

Paper
Add Code

Combining Sequence Distillation and Transfer Learning for Efficient Low-Resource Neural Machine Translation Models

no code implementations • WMT (EMNLP) 2020 • Raj Dabre, Atsushi Fujita

This paper investigates a combination of SD and TL for training efficient NMT models for ELR settings, where we utilize TL with helping corpora twice: once for distilling the ELR corpora and then during compact model training.

Low-Resource Neural Machine Translation NMT +3

Paper
Add Code

Kyoto University MT System Description for IWSLT 2017

no code implementations • IWSLT 2017 • Raj Dabre, Fabien Cromieres, Sadao Kurohashi

We describe here our Machine Translation (MT) model and the results we obtained for the IWSLT 2017 Multilingual Shared Task.

Machine Translation NMT +1

Paper
Add Code

Overview of the 9th Workshop on Asian Translation

no code implementations • WAT 2022 • Toshiaki Nakazawa, Hideya Mino, Isao Goto, Raj Dabre, Shohei Higashiyama, Shantipriya Parida, Anoop Kunchukuttan, Makoto Morishita, Ondřej Bojar, Chenhui Chu, Akiko Eriguchi, Kaori Abe, Yusuke Oda, Sadao Kurohashi

This paper presents the results of the shared tasks from the 9th workshop on Asian translation (WAT2022).

Translation

Paper
Add Code

Overview of the 5th Workshop on Asian Translation

no code implementations • PACLIC 2018 • Toshiaki Nakazawa, Katsuhito Sudoh, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Sadao Kurohashi

Translation

Paper
Add Code

FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT

no code implementations • COLING 2022 • Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Eiichiro Sumita

In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART).

LEMMA NMT

Paper
Add Code

NICT‘s Submission To WAT 2020: How Effective Are Simple Many-To-Many Neural Machine Translation Models?

no code implementations • AACL (WAT) 2020 • Raj Dabre, Abhisek Chakrabarty

In this paper we describe our team‘s (NICT-5) Neural Machine Translation (NMT) models whose translations were submitted to shared tasks of the 7th Workshop on Asian Translation.

Machine Translation NMT +1

Paper
Add Code

Overview of the 7th Workshop on Asian Translation

no code implementations • AACL (WAT) 2020 • Toshiaki Nakazawa, Hideki Nakayama, Chenchen Ding, Raj Dabre, Shohei Higashiyama, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Shantipriya Parida, Ondřej Bojar, Sadao Kurohashi

This paper presents the results of the shared tasks from the 7th workshop on Asian translation (WAT2020).

Translation

Paper
Add Code

Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation

no code implementations • MTSummit 2021 • Raj Dabre, Aizhan Imankulova, Masahiro Kaneko

To this end and in this paper and we propose wait-k simultaneous document-level NMT where we keep the context encoder as it is and replace the source sentence encoder and target language decoder with their wait-k equivalents.

Machine Translation NMT +2

Paper
Add Code

Investigating Softmax Tempering for Training Neural Machine Translation Models

no code implementations • MTSummit 2021 • Raj Dabre, Atsushi Fujita

In low-resource scenarios and NMT models tend to perform poorly because the model training quickly converges to a point where the softmax distribution computed using logits approaches the gold label distribution.

Machine Translation NMT +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.