Search Results for author: Raj Dabre

Found 85 papers, 19 papers with code

Parallel Sentence Extraction from Comparable Corpora with Neural Network Features

no code implementations LREC 2016 Chenhui Chu, Raj Dabre, Sadao Kurohashi

Parallel corpora are crucial for machine translation (MT), however they are quite scarce for most language pairs and domains.

Machine Translation Sentence +1

An Empirical Comparison of Simple Domain Adaptation Methods for Neural Machine Translation

no code implementations12 Jan 2017 Chenhui Chu, Raj Dabre, Sadao Kurohashi

In this paper, we propose a novel domain adaptation method named "mixed fine tuning" for neural machine translation (NMT).

Domain Adaptation Machine Translation +2

Enabling Multi-Source Neural Machine Translation By Concatenating Source Sentences In Multiple Languages

no code implementations MTSummit 2017 Raj Dabre, Fabien Cromieres, Sadao Kurohashi

In this paper, we explore a simple solution to "Multi-Source Neural Machine Translation" (MSNMT) which only relies on preprocessing a N-way multilingual corpus without modifying the Neural Machine Translation (NMT) architecture or training procedure.

Machine Translation NMT +2

MMCR4NLP: Multilingual Multiway Corpora Repository for Natural Language Processing

1 code implementation3 Oct 2017 Raj Dabre, Sadao Kurohashi

Multilinguality is gradually becoming ubiquitous in the sense that more and more researchers have successfully shown that using additional languages help improve the results in many Natural Language Processing tasks.

Machine Translation Multilingual NLP +3

Recurrent Stacking of Layers for Compact Neural Machine Translation Models

no code implementations14 Jul 2018 Raj Dabre, Atsushi Fujita

In neural machine translation (NMT), the most common practice is to stack a number of recurrent or feed-forward layers in the encoder and the decoder.

Machine Translation NMT +1

A Brief Survey of Multilingual Neural Machine Translation

no code implementations14 May 2019 Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.

Machine Translation Transfer Learning +1

Multilingual Multi-Domain Adaptation Approaches for Neural Machine Translation

no code implementations19 Jun 2019 Chenhui Chu, Raj Dabre

In this paper, we propose two novel methods for domain adaptation for the attention-only neural machine translation (NMT) model, i. e., the Transformer.

Domain Adaptation Machine Translation +2

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

1 code implementation WS 2019 Aizhan Imankulova, Raj Dabre, Atsushi Fujita, Kenji Imamura

This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking.

Benchmarking Domain Adaptation +4

NICT's Supervised Neural Machine Translation Systems for the WMT19 News Translation Task

no code implementations WS 2019 Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.

Machine Translation NMT +2

Multi-Layer Softmaxing during Training Neural Machine Translation for Flexible Decoding with Fewer Layers

no code implementations27 Aug 2019 Raj Dabre, Atsushi Fujita

This paper proposes a novel procedure for training an encoder-decoder based deep neural network which compresses NxM models into a single model enabling us to dynamically choose the number of encoder and decoder layers for decoding.

Machine Translation Translation

NICT's participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT

no code implementations WS 2019 Raj Dabre, Eiichiro Sumita

In this paper we describe our submissions to WAT 2019 for the following tasks: English{--}Tamil translation and Russian{--}Japanese translation.

Domain Adaptation NMT +1

Overview of the 6th Workshop on Asian Translation

no code implementations WS 2019 Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Yusuke Oda, Shantipriya Parida, Ond{\v{r}}ej Bojar, Sadao Kurohashi

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task.

Translation

Exploiting Multilingualism through Multistage Fine-Tuning for Low-Resource Neural Machine Translation

no code implementations IJCNLP 2019 Raj Dabre, Atsushi Fujita, Chenhui Chu

This paper highlights the impressive utility of multi-parallel corpora for transfer learning in a one-to-many low-resource neural machine translation (NMT) setting.

Low-Resource Neural Machine Translation NMT +2

Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation

1 code implementation LREC 2020 Haiyue Song, Raj Dabre, Atsushi Fujita, Sadao Kurohashi

To address this, we examine a language independent framework for parallel corpus mining which is a quick and effective way to mine a parallel corpus from publicly available lectures at Coursera.

Benchmarking Domain Adaptation +4

A Comprehensive Survey of Multilingual Neural Machine Translation

no code implementations4 Jan 2020 Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

We present a survey on multilingual neural machine translation (MNMT), which has gained a lot of traction in the recent years.

Machine Translation NMT +2

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

no code implementations23 Jan 2020 Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

To this end, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI.

Machine Translation NMT +1

Balancing Cost and Benefit with Tied-Multi Transformers

no code implementations WS 2020 Raj Dabre, Raphael Rubino, Atsushi Fujita

We propose and evaluate a novel procedure for training multiple Transformers with tied parameters which compresses multiple models into one enabling the dynamic choice of the number of encoder and decoder layers during decoding.

Knowledge Distillation Machine Translation +2

JASS: Japanese-specific Sequence to Sequence Pre-training for Neural Machine Translation

1 code implementation LREC 2020 Zhuoyuan Mao, Fabien Cromieres, Raj Dabre, Haiyue Song, Sadao Kurohashi

Monolingual pre-training approaches such as MASS (MAsked Sequence to Sequence) are extremely effective in boosting NMT quality for languages with small parallel corpora.

Machine Translation NMT +2

Softmax Tempering for Training Neural Machine Translation Models

no code implementations20 Sep 2020 Raj Dabre, Atsushi Fujita

Neural machine translation (NMT) models are typically trained using a softmax cross-entropy loss where the softmax distribution is compared against smoothed gold labels.

Machine Translation NMT +1

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

no code implementations COLING 2020 Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data.

Machine Translation NMT +1

Multilingual Neural Machine Translation

no code implementations COLING 2020 Raj Dabre, Chenhui Chu, Anoop Kunchukuttan

The advent of neural machine translation (NMT) has opened up exciting research in building multilingual translation systems i. e. translation models that can handle more than one language pair.

Machine Translation NMT +2

Simultaneous Multi-Pivot Neural Machine Translation

no code implementations15 Apr 2021 Raj Dabre, Aizhan Imankulova, Masahiro Kaneko, Abhisek Chakrabarty

Parallel corpora are indispensable for training neural machine translation (NMT) models, and parallel corpora for most language pairs do not exist or are scarce.

Machine Translation NMT +1

Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation

no code implementations18 Jun 2021 Raj Dabre, Atsushi Fujita

Finally, we analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not.

Knowledge Distillation Machine Translation +3

YANMTT: Yet Another Neural Machine Translation Toolkit

no code implementations25 Aug 2021 Raj Dabre, Eiichiro Sumita

In this paper we present our open-source neural machine translation (NMT) toolkit called "Yet Another Neural Machine Translation Toolkit" abbreviated as YANMTT which is built on top of the Transformers library.

Machine Translation Model Compression +3

Fusion of Self-supervised Learned Models for MOS Prediction

no code implementations11 Apr 2022 Zhengdong Yang, Wangjin Zhou, Chenhui Chu, Sheng Li, Raj Dabre, Raphael Rubino, Yi Zhao

This challenge aims to predict MOS scores of synthetic speech on two tracks, the main track and a more challenging sub-track: out-of-domain (OOD).

When do Contrastive Word Alignments Improve Many-to-many Neural Machine Translation?

no code implementations Findings (NAACL) 2022 Zhuoyuan Mao, Chenhui Chu, Raj Dabre, Haiyue Song, Zhen Wan, Sadao Kurohashi

Meanwhile, the contrastive objective can implicitly utilize automatically learned word alignment, which has not been explored in many-to-many NMT.

Machine Translation NMT +4

MorisienMT: A Dataset for Mauritian Creole Machine Translation

no code implementations6 Jun 2022 Raj Dabre, Aneerav Sukhoo

In this paper, we describe MorisienMT, a dataset for benchmarking machine translation quality of Mauritian Creole.

Benchmarking Machine Translation +2

MT Metrics Correlate with Human Ratings of Simultaneous Speech Translation

1 code implementation16 Nov 2022 Dominik Macháček, Ondřej Bojar, Raj Dabre

There have been several meta-evaluation studies on the correlation between human ratings and offline machine translation (MT) evaluation metrics such as BLEU, chrF2, BertScore and COMET.

Machine Translation Translation

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

1 code implementation20 Dec 2022 Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics.

Machine Translation

A Comprehensive Analysis of Adapter Efficiency

2 code implementations12 May 2023 Nandini Mundra, Sumanth Doddapaneni, Raj Dabre, Anoop Kunchukuttan, Ratish Puduppully, Mitesh M. Khapra

However, adapters have not been sufficiently analyzed to understand if PEFT translates to benefits in training/deployment efficiency and maintainability/extensibility.

Natural Language Understanding

Variable-length Neural Interlingua Representations for Zero-shot Neural Machine Translation

no code implementations17 May 2023 Zhuoyuan Mao, Haiyue Song, Raj Dabre, Chenhui Chu, Sadao Kurohashi

The language-independency of encoded representations within multilingual neural machine translation (MNMT) models is crucial for their generalization ability on zero-shot translation.

Machine Translation Translation

Decomposed Prompting for Machine Translation Between Related Languages using Large Language Models

1 code implementation22 May 2023 Ratish Puduppully, Anoop Kunchukuttan, Raj Dabre, Ai Ti Aw, Nancy F. Chen

This study investigates machine translation between related languages i. e., languages within the same family that share linguistic characteristics such as word order and lexical similarity.

Machine Translation Translation

CTQScorer: Combining Multiple Features for In-context Example Selection for Machine Translation

1 code implementation23 May 2023 Aswanth Kumar, Ratish Puduppully, Raj Dabre, Anoop Kunchukuttan

We learn a regression model, CTQ Scorer (Contextual Translation Quality), that selects examples based on multiple features in order to maximize the translation quality.

In-Context Learning Machine Translation +2

IndicTrans2: Towards High-Quality and Accessible Machine Translation Models for all 22 Scheduled Indian Languages

2 code implementations25 May 2023 Jay Gala, Pranjal A. Chitale, Raghavan AK, Varun Gumma, Sumanth Doddapaneni, Aswanth Kumar, Janki Nawale, Anupama Sujatha, Ratish Puduppully, Vivek Raghavan, Pratyush Kumar, Mitesh M. Khapra, Raj Dabre, Anoop Kunchukuttan

Prior to this work, there was (i) no parallel training data spanning all 22 languages, (ii) no robust benchmarks covering all these languages and containing content relevant to India, and (iii) no existing translation models which support all the 22 scheduled languages of India.

Machine Translation Sentence +1

Robustness of Multi-Source MT to Transcription Errors

no code implementations26 May 2023 Dominik Macháček, Peter Polák, Ondřej Bojar, Raj Dabre

Automatic speech translation is sensitive to speech recognition errors, but in a multilingual scenario, the same content may be available in various languages via simultaneous interpreting, dubbing or subtitling.

Machine Translation speech-recognition +2

SciCap+: A Knowledge Augmented Dataset to Study the Challenges of Scientific Figure Captioning

1 code implementation6 Jun 2023 Zhishen Yang, Raj Dabre, Hideki Tanaka, Naoaki Okazaki

Automating figure caption generation helps move model understandings of scientific documents beyond text and will help authors write informative captions that facilitate communicating scientific findings.

Caption Generation Image Captioning +1

Turning Whisper into Real-Time Transcription System

1 code implementation27 Jul 2023 Dominik Macháček, Raj Dabre, Ondřej Bojar

Whisper is one of the recent state-of-the-art multilingual speech recognition and translation models, however, it is not designed for real time transcription.

speech-recognition Speech Recognition +1

Bilingual Corpus Mining and Multistage Fine-Tuning for Improving Machine Translation of Lecture Transcripts

1 code implementation7 Nov 2023 Haiyue Song, Raj Dabre, Chenhui Chu, Atsushi Fujita, Sadao Kurohashi

To create the parallel corpora, we propose a dynamic programming based sentence alignment algorithm which leverages the cosine similarity of machine-translated sentences.

Benchmarking Machine Translation +3

Natural Language Processing for Dialects of a Language: A Survey

no code implementations11 Jan 2024 Aditya Joshi, Raj Dabre, Diptesh Kanojia, Zhuang Li, Haolan Zhan, Gholamreza Haffari, Doris Dippold

Motivated by the performance degradation of NLP models for dialectic datasets and its implications for the equity of language technologies, we survey past research in NLP for dialects in terms of datasets, and approaches.

Attribute Machine Translation +4

PUB: A Pragmatics Understanding Benchmark for Assessing LLMs' Pragmatics Capabilities

no code implementations13 Jan 2024 Settaluri Lakshmi Sravanthi, Meet Doshi, Tankala Pavan Kalyan, Rudra Murthy, Pushpak Bhattacharyya, Raj Dabre

To demonstrate this fact, we release a Pragmatics Understanding Benchmark (PUB) dataset consisting of fourteen tasks in four pragmatics phenomena, namely, Implicature, Presupposition, Reference, and Deixis.

Instruction Following Multiple-choice

An Empirical Study of In-context Learning in LLMs for Machine Translation

no code implementations22 Jan 2024 Pranjal A. Chitale, Jay Gala, Raj Dabre

While we establish the significance of the quality of the target distribution over the source distribution of demonstrations, we further observe that perturbations sometimes act as regularizers, resulting in performance improvements.

In-Context Learning Machine Translation +2

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

no code implementations24 Jan 2024 Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion.

FAD

IndicLLMSuite: A Blueprint for Creating Pre-training and Fine-Tuning Datasets for Indian Languages

1 code implementation11 Mar 2024 Mohammed Safi Ur Rahman Khan, Priyam Mehta, Ananth Sankar, Umashankar Kumaravelan, Sumanth Doddapaneni, Suriyaprasaad G, Varun Balan G, Sparsh Jain, Anoop Kunchukuttan, Pratyush Kumar, Raj Dabre, Mitesh M. Khapra

We hope that the datasets, tools, and resources released as a part of this work will not only propel the research and development of Indic LLMs but also establish an open-source blueprint for extending such efforts to other languages.

Do Not Worry if You Do Not Have Data: Building Pretrained Language Models Using Translationese

no code implementations20 Mar 2024 Meet Doshi, Raj Dabre, Pushpak Bhattacharyya

In this paper, we explore the utility of Translationese as synthetic data created using machine translation for pre-training language models (LMs).

Machine Translation Natural Language Understanding

A Morphology-Based Investigation of Positional Encodings

no code implementations6 Apr 2024 Poulami Ghosh, Shikhar Vashishth, Raj Dabre, Pushpak Bhattacharyya

How does the importance of positional encoding in pre-trained language models (PLMs) vary across languages with different morphological complexity?

Dependency Parsing named-entity-recognition +3

NICT-5’s Submission To WAT 2021: MBART Pre-training And In-Domain Fine Tuning For Indic Languages

no code implementations ACL (WAT) 2021 Raj Dabre, Abhisek Chakrabarty

The objective of the task was to explore the utility of multilingual approaches using a variety of in-domain and out-of-domain parallel and monolingual corpora.

NMT Translation

NICT’s Submission to the WAT 2022 Structured Document Translation Task

no code implementations WAT 2022 Raj Dabre

However, to our surprise, we find that existing multilingual NMT systems are able to handle the translation of text annotated with XML tags without any explicit training on data containing said tags.

Document Translation NMT +3

Combining Sequence Distillation and Transfer Learning for Efficient Low-Resource Neural Machine Translation Models

no code implementations WMT (EMNLP) 2020 Raj Dabre, Atsushi Fujita

This paper investigates a combination of SD and TL for training efficient NMT models for ELR settings, where we utilize TL with helping corpora twice: once for distilling the ELR corpora and then during compact model training.

Low-Resource Neural Machine Translation NMT +3

Kyoto University MT System Description for IWSLT 2017

no code implementations IWSLT 2017 Raj Dabre, Fabien Cromieres, Sadao Kurohashi

We describe here our Machine Translation (MT) model and the results we obtained for the IWSLT 2017 Multilingual Shared Task.

Machine Translation NMT +1

FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT

no code implementations COLING 2022 Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Eiichiro Sumita

In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART).

LEMMA NMT

NICT‘s Submission To WAT 2020: How Effective Are Simple Many-To-Many Neural Machine Translation Models?

no code implementations AACL (WAT) 2020 Raj Dabre, Abhisek Chakrabarty

In this paper we describe our team‘s (NICT-5) Neural Machine Translation (NMT) models whose translations were submitted to shared tasks of the 7th Workshop on Asian Translation.

Machine Translation NMT +1

Studying The Impact Of Document-level Context On Simultaneous Neural Machine Translation

no code implementations MTSummit 2021 Raj Dabre, Aizhan Imankulova, Masahiro Kaneko

To this end and in this paper and we propose wait-k simultaneous document-level NMT where we keep the context encoder as it is and replace the source sentence encoder and target language decoder with their wait-k equivalents.

Machine Translation NMT +2

Investigating Softmax Tempering for Training Neural Machine Translation Models

no code implementations MTSummit 2021 Raj Dabre, Atsushi Fujita

In low-resource scenarios and NMT models tend to perform poorly because the model training quickly converges to a point where the softmax distribution computed using logits approaches the gold label distribution.

Machine Translation NMT +1

Cannot find the paper you are looking for? You can Submit a new open access paper.