Search Results for author: Eiichiro Sumita

Found 135 papers, 12 papers with code

Neural Machine Translation with Universal Visual Representation

1 code implementation ICLR 2020 Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Though visual information has been introduced for enhancing neural machine translation (NMT), its effectiveness strongly relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations.

Machine Translation NMT +2

Khmer Word Segmentation Using Conditional Random Fields

1 code implementation15 Oct 2015 Vichet Chea, Ye Kyaw Thu, Chenchen Ding, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The trained CRF segmenter was compared empirically to a baseline approach based on maximum matching that used a dictionary extracted from the manually segmented corpus.

Segmentation Text Segmentation +1

Explicit Sentence Compression for Neural Machine Translation

1 code implementation27 Dec 2019 Zuchao Li, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao

In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT.

Machine Translation NMT +3

Smoothing Dialogue States for Open Conversational Machine Reading

1 code implementation EMNLP 2021 Zhuosheng Zhang, Siru Ouyang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

In this work, we propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation to provide a richer dialogue state reference.

Decision Making Question Generation +2

Exploring Recombination for Efficient Decoding of Neural Machine Translation

1 code implementation EMNLP 2018 Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao

In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations.

Machine Translation NMT +1

Extending the Subwording Model of Multilingual Pretrained Models for New Languages

1 code implementation29 Nov 2022 Kenji Imamura, Eiichiro Sumita

Multilingual pretrained models are effective for machine translation and cross-lingual processing because they contain multiple languages in one model.

Machine Translation Translation

Reference Language based Unsupervised Neural Machine Translation

1 code implementation Findings of the Association for Computational Linguistics 2020 Zuchao Li, Hai Zhao, Rui Wang, Masao Utiyama, Eiichiro Sumita

Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism.

Machine Translation Translation

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

no code implementations ACL 2018 Rui Wang, Masao Utiyama, Eiichiro Sumita

Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch.

Machine Translation NMT +2

Syntax-Directed Attention for Neural Machine Translation

no code implementations12 Nov 2017 Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

In this paper, we extend local attention with syntax-distance constraint, to focus on syntactically related source words with the predicted target word, thus learning a more effective context vector for word prediction.

Machine Translation NMT +1

Neural Machine Translation with Supervised Attention

no code implementations COLING 2016 Lemao Liu, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The attention mechanisim is appealing for neural machine translation, since it is able to dynam- ically encode a source sentence by generating a alignment between a target word and source words.

Machine Translation NMT +2

NICT's Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task

no code implementations WS 2018 Rui Wang, Benjamin Marie, Masao Utiyama, Eiichiro Sumita

Using the clean data of the WMT18 shared news translation task, we designed several features and trained a classifier to score each sentence pairs in the noisy data.

Machine Translation NMT +2

Forest-Based Neural Machine Translation

no code implementations ACL 2018 Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, Eiichiro Sumita

Tree-based neural machine translation (NMT) approaches, although achieved impressive performance, suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors.

Machine Translation NMT +1

Simplified Abugidas

no code implementations ACL 2018 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics.

Sentence

Sentence Embedding for Neural Machine Translation Domain Adaptation

no code implementations ACL 2017 Rui Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance.

Domain Adaptation Language Modelling +6

Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing

no code implementations WS 2017 Atsushi Fujita, Eiichiro Sumita

Aiming at facilitating the research on quality estimation (QE) and automatic post-editing (APE) of machine translation (MT) outputs, especially for those among Asian languages, we have created new datasets for Japanese to English, Chinese, and Korean translations.

Automatic Post-Editing Benchmarking +2

Global Pre-ordering for Improving Sublanguage Translation

no code implementations WS 2016 Masaru Fuji, Masao Utiyama, Eiichiro Sumita, Yuji Matsumoto

When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations.

Machine Translation Sentence +1

An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation

no code implementations WS 2016 Xiaolin Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine.

Automatic Speech Recognition (ASR) Machine Translation +5

Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

no code implementations WS 2016 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank.

Machine Translation Translation +1

A Prototype Automatic Simultaneous Interpretation System

no code implementations COLING 2016 Xiaolin Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Simultaneous interpretation allows people to communicate spontaneously across language boundaries, but such services are prohibitively expensive for the general public.

Context-Aware Smoothing for Neural Machine Translation

no code implementations IJCNLP 2017 Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

In Neural Machine Translation (NMT), each word is represented as a low-dimension, real-value vector for encoding its syntax and semantic information.

Machine Translation NMT +3

Key-value Attention Mechanism for Neural Machine Translation

no code implementations IJCNLP 2017 Hideya Mino, Masao Utiyama, Eiichiro Sumita, Takenobu Tokunaga

In this paper, we propose a neural machine translation (NMT) with a key-value attention mechanism on the source-side encoder.

Machine Translation NMT +1

Introducing the Asian Language Treebank (ALT)

no code implementations LREC 2016 Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future.

Sentence Translation

ASPEC: Asian Scientific Paper Excerpt Corpus

no code implementations LREC 2016 Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchimoto, Masao Utiyama, Eiichiro Sumita, Sadao Kurohashi, Hitoshi Isahara

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain.

Machine Translation Translation

Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation

no code implementations ACL 2019 Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

In previous methods, UBWE is first trained using non-parallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT.

Denoising Machine Translation +1

Sentence-Level Agreement for Neural Machine Translation

no code implementations ACL 2019 Mingming Yang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Min Zhang, Tiejun Zhao

The training objective of neural machine translation (NMT) is to minimize the loss between the words in the translated sentences and those in the references.

Machine Translation NMT +2

NICT's Supervised Neural Machine Translation Systems for the WMT19 News Translation Task

no code implementations WS 2019 Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.

Machine Translation NMT +2

Revisiting Simple Domain Adaptation Methods in Unsupervised Neural Machine Translation

no code implementations26 Aug 2019 Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, Chenhui Chu

However, it has not been well-studied for unsupervised neural machine translation (UNMT), although UNMT has recently achieved remarkable results in several domain-specific language pairs.

Domain Adaptation Machine Translation +1

Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation

no code implementations WS 2019 Junya Ono, Masao Utiyama, Eiichiro Sumita

We apply a model parallel approach to the RNN encoder-decoder part of the Seq2Seq model and a data parallel approach to the attention-softmax part of the model.

Machine Translation Translation

Document-level Neural Machine Translation with Associated Memory Network

no code implementations31 Oct 2019 Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, Hai Zhao, Bao-liang Lu

Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network.

Machine Translation NMT +2

Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English

no code implementations WS 2019 Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks.

NMT Translation +1

NICT's participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT

no code implementations WS 2019 Raj Dabre, Eiichiro Sumita

In this paper we describe our submissions to WAT 2019 for the following tasks: English{--}Tamil translation and Russian{--}Japanese translation.

Domain Adaptation NMT +1

English-Myanmar Supervised and Unsupervised NMT: NICT's Machine Translation Systems at WAT-2019

no code implementations WS 2019 Rui Wang, Haipeng Sun, Kehai Chen, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions.

Language Modelling Machine Translation +2

Recycling a Pre-trained BERT Encoder for Neural Machine Translation

no code implementations WS 2019 Kenji Imamura, Eiichiro Sumita

In this paper, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model is applied to Transformer-based neural machine translation (NMT).

Machine Translation NMT +1

SJTU-NICT at MRP 2019: Multi-Task Learning for End-to-End Uniform Semantic Graph Parsing

no code implementations CONLL 2019 Zuchao Li, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

This paper describes our SJTU-NICT{'}s system for participating in the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL).

Multi-Task Learning

MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method

no code implementations IJCNLP 2019 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription.

Probing Contextualized Sentence Representations with Visual Awareness

no code implementations7 Nov 2019 Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao

We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations.

Machine Translation Natural Language Inference +2

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

no code implementations23 Jan 2020 Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

To this end, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI.

Machine Translation NMT +1

Modeling Future Cost for Neural Machine Translation

no code implementations28 Feb 2020 Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao

Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible.

Machine Translation NMT +1

Explicit Reordering for Neural Machine Translation

no code implementations8 Apr 2020 Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

Thus, we propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.

Machine Translation NMT +2

Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

no code implementations NAACL 2021 Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks.

Machine Translation Translation

A Myanmar (Burmese)-English Named Entity Transliteration Dictionary

no code implementations LREC 2020 Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita

For the Myanmar (Burmese) language, robust automatic transliteration for borrowed English words is a challenging task because of the complex Myanmar writing system and the lack of data.

Transliteration

Content Word Aware Neural Machine Translation

no code implementations ACL 2020 Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

Neural machine translation (NMT) encodes the source sentence in a universal way to generate the target sentence word-by-word.

Machine Translation NMT +2

A Three-Parameter Rank-Frequency Relation in Natural Languages

no code implementations ACL 2020 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

We present that, the rank-frequency relation in textual data follows $f \propto r^{-\alpha}(r+\gamma)^{-\beta}$, where $f$ is the token frequency and $r$ is the rank by frequency, with ($\alpha$, $\beta$, $\gamma$) as parameters.

Relation

Data-dependent Gaussian Prior Objective for Language Generation

no code implementations ICLR 2020 Zuchao Li, Rui Wang, Kehai Chen, Masso Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao

However, MLE focuses on once-to-all matching between the predicted sequence and gold-standard, consequently treating all incorrect predictions as being equally incorrect.

Image Captioning L2 Regularization +4

Prior Knowledge Representation for Self-Attention Networks

no code implementations1 Jan 2021 Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

Self-attention networks (SANs) have shown promising empirical results in various natural language processing tasks.

Translation

Cross-lingual Transfer Learning for Pre-trained Contextualized Language Models

no code implementations1 Jan 2021 Zuchao Li, Kevin Barry Parnow, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant.

Cross-Lingual Transfer Language Modelling +3

Text Compression-aided Transformer Encoding

no code implementations11 Feb 2021 Zuchao Li, Zhuosheng Zhang, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily.

Text Compression

Cross-lingual Transferring of Pre-trained Contextualized Language Models

no code implementations27 Jul 2021 Zuchao Li, Kevin Parnow, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant.

Language Modelling Machine Translation +1

YANMTT: Yet Another Neural Machine Translation Toolkit

no code implementations25 Aug 2021 Raj Dabre, Eiichiro Sumita

In this paper we present our open-source neural machine translation (NMT) toolkit called "Yet Another Neural Machine Translation Toolkit" abbreviated as YANMTT which is built on top of the Transformers library.

Machine Translation Model Compression +3

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

no code implementations COLING 2020 Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data.

Machine Translation NMT +1

Bilingual Subword Segmentation for Neural Machine Translation

no code implementations COLING 2020 Hiroyuki Deguchi, Masao Utiyama, Akihiro Tamura, Takashi Ninomiya, Eiichiro Sumita

This paper proposed a new subword segmentation method for neural machine translation, {``}Bilingual Subword Segmentation,{''} which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation.

Machine Translation Segmentation +2

Intermediate Self-supervised Learning for Machine Translation Quality Estimation

no code implementations COLING 2020 Raphael Rubino, Eiichiro Sumita

The proposed method does not rely on annotated data and is complementary to QE methods involving pre-trained sentence encoders and domain adaptation.

Domain Adaptation Language Modelling +4

NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs

no code implementations ACL (WAT) 2021 Kenji Imamura, Eiichiro Sumita

In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021).

Translation

What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation

no code implementations Findings (ACL) 2022 Zuchao Li, Yiran Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao, Taro Watanabe

Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT.

Language Modelling Machine Translation +2

Synchronous Refinement for Neural Machine Translation

no code implementations Findings (ACL) 2022 Kehai Chen, Masao Utiyama, Eiichiro Sumita, Rui Wang, Min Zhang

Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner.

Machine Translation Sentence +1

Restricted or Not: A General Training Framework for Neural Machine Translation

no code implementations ACL 2022 Zuchao Li, Masao Utiyama, Eiichiro Sumita, Hai Zhao

Although this can satisfy the requirements overall, it usually requires a larger beam size and far longer decoding time than unrestricted translation, which limits the concurrent processing ability of the translation model in deployment, and thus its practicality.

Machine Translation Translation

A Multimodal Simultaneous Interpretation Prototype: Who Said What

no code implementations AMTA 2022 Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

“Who said what” is essential for users to understand video streams that have more than one speaker, but conventional simultaneous interpretation systems merely present “what was said” in the form of subtitles.

Sentence TAG +1

FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT

no code implementations COLING 2022 Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Eiichiro Sumita

In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART).

LEMMA NMT

Multi-Source Cross-Lingual Constituency Parsing

no code implementations ICON 2021 Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, Satoshi Nakamura

Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information.

Constituency Parsing Cross-Lingual Transfer +1

Language Model Pre-training on True Negatives

no code implementations1 Dec 2022 Zhuosheng Zhang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

Discriminative pre-trained language models (PLMs) learn to predict original texts from intentionally corrupted ones.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.