Search Results for author: Eiichiro Sumita

Found 135 papers, 12 papers with code

Neural Machine Translation with Universal Visual Representation

1 code implementation • ICLR 2020 • Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, Hai Zhao

Though visual information has been introduced for enhancing neural machine translation (NMT), its effectiveness strongly relies on the availability of large amounts of bilingual parallel sentence pairs with manual image annotations.

Machine Translation NMT +2

Paper
Code

Khmer Word Segmentation Using Conditional Random Fields

1 code implementation • 15 Oct 2015 • Vichet Chea, Ye Kyaw Thu, Chenchen Ding, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The trained CRF segmenter was compared empirically to a baseline approach based on maximum matching that used a dictionary extracted from the manually segmented corpus.

Segmentation Text Segmentation +1

Paper
Code

CytonMT: an Efficient Neural Machine Translation Open-source Toolkit Implemented in C++

1 code implementation • EMNLP 2018 • Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

This paper presents an open-source neural machine translation toolkit named CytonMT (https://github. com/arthurxlw/cytonMt).

Machine Translation Translation

Paper
Code

Transition-based Neural Constituent Parsing

1 code implementation • IJCNLP 2015 • Taro Watanabe, Eiichiro Sumita

Paper
Code

Explicit Sentence Compression for Neural Machine Translation

1 code implementation • 27 Dec 2019 • Zuchao Li, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao

In this paper, we propose an explicit sentence compression method to enhance the source sentence representation for NMT.

Machine Translation NMT +3

Paper
Code

Smoothing Dialogue States for Open Conversational Machine Reading

1 code implementation • EMNLP 2021 • Zhuosheng Zhang, Siru Ouyang, Hai Zhao, Masao Utiyama, Eiichiro Sumita

In this work, we propose an effective gating strategy by smoothing the two dialogue states in only one decoder and bridge decision making and question generation to provide a richer dialogue state reference.

Decision Making Question Generation +2

Paper
Code

Incorporating Word Attention into Character-Based Word Segmentation

1 code implementation • NAACL 2019 • Shohei Higashiyama, Masao Utiyama, Eiichiro Sumita, Masao Ideuchi, Yoshiaki Oida, Yohei Sakamoto, Isaac Okada

Neural network models have been actively applied to word segmentation, especially Chinese, because of the ability to minimize the effort in feature engineering.

Ranked #2 on Japanese Word Segmentation on BCCWJ

Feature Engineering Japanese Word Segmentation +1

Paper
Code

Exploring Recombination for Efficient Decoding of Neural Machine Translation

1 code implementation • EMNLP 2018 • Zhisong Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao

In Neural Machine Translation (NMT), the decoder can capture the features of the entire prediction history with neural connections and representations.

Machine Translation NMT +1

Paper
Code

User-Generated Text Corpus for Evaluating Japanese Morphological Analysis and Lexical Normalization

1 code implementation • NAACL 2021 • Shohei Higashiyama, Masao Utiyama, Taro Watanabe, Eiichiro Sumita

Morphological analysis (MA) and lexical normalization (LN) are both important tasks for Japanese user-generated text (UGT).

Lexical Normalization Morphological Analysis

Paper
Code

Extending the Subwording Model of Multilingual Pretrained Models for New Languages

1 code implementation • 29 Nov 2022 • Kenji Imamura, Eiichiro Sumita

Multilingual pretrained models are effective for machine translation and cross-lingual processing because they contain multiple languages in one model.

Machine Translation Translation

Paper
Code

Instance Weighting for Neural Machine Translation Domain Adaptation

1 code implementation • EMNLP 2017 • Rui Wang, Masao Utiyama, Lemao Liu, Kehai Chen, Eiichiro Sumita

Instance weighting has been widely applied to phrase-based machine translation domain adaptation.

Domain Adaptation Machine Translation +3

Paper
Code

Reference Language based Unsupervised Neural Machine Translation

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Zuchao Li, Hai Zhao, Rui Wang, Masao Utiyama, Eiichiro Sumita

Further enriching the idea of pivot translation by extending the use of parallel corpora beyond the source-target paradigm, we propose a new reference language-based framework for UNMT, RUNMT, in which the reference language only shares a parallel corpus with the source, but this corpus still indicates a signal clear enough to help the reconstruction training of UNMT through a proposed reference agreement mechanism.

Machine Translation Translation

Paper
Code

Dynamic Sentence Sampling for Efficient Training of Neural Machine Translation

no code implementations • ACL 2018 • Rui Wang, Masao Utiyama, Eiichiro Sumita

Traditional Neural machine translation (NMT) involves a fixed training procedure where each sentence is sampled once during each epoch.

Machine Translation NMT +2

Paper
Add Code

Syntax-Directed Attention for Neural Machine Translation

no code implementations • 12 Nov 2017 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

In this paper, we extend local attention with syntax-distance constraint, to focus on syntactically related source words with the predicted target word, thus learning a more effective context vector for word prediction.

Machine Translation NMT +1

Paper
Add Code

Neural Machine Translation with Supervised Attention

no code implementations • COLING 2016 • Lemao Liu, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The attention mechanisim is appealing for neural machine translation, since it is able to dynam- ically encode a source sentence by generating a alignment between a target word and source words.

Machine Translation NMT +2

Paper
Add Code

A Novel Bilingual Word Embedding Method for Lexical Translation Using Bilingual Sense Clique

no code implementations • 29 Jul 2016 • Rui Wang, Hai Zhao, Sabine Ploux, Bao-liang Lu, Masao Utiyama, Eiichiro Sumita

Most of the existing methods for bilingual word embedding only consider shallow context or simple co-occurrence information.

Dimensionality Reduction Translation

Paper
Add Code

NICT's Corpus Filtering Systems for the WMT18 Parallel Corpus Filtering Task

no code implementations • WS 2018 • Rui Wang, Benjamin Marie, Masao Utiyama, Eiichiro Sumita

Using the clean data of the WMT18 shared news translation task, we designed several features and trained a classifier to score each sentence pairs in the noisy data.

Machine Translation NMT +2

Paper
Add Code

NICT's Neural and Statistical Machine Translation Systems for the WMT18 News Translation Task

no code implementations • WS 2018 • Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

Our systems are ranked first for the Estonian-English and Finnish-English language pairs (constraint) according to BLEU-cased.

Machine Translation NMT +1

Paper
Add Code

Forest-Based Neural Machine Translation

no code implementations • ACL 2018 • Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Tiejun Zhao, Eiichiro Sumita

Tree-based neural machine translation (NMT) approaches, although achieved impressive performance, suffer from a major drawback: they only use the 1-best parse tree to direct the translation, which potentially introduces translation mistakes due to parsing errors.

Machine Translation NMT +1

Paper
Add Code

Simplified Abugidas

no code implementations • ACL 2018 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita

An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics.

Sentence

Paper
Add Code

Sentence Embedding for Neural Machine Translation Domain Adaptation

no code implementations • ACL 2017 • Rui Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Although new corpora are becoming increasingly available for machine translation, only those that belong to the same or similar domains are typically able to improve translation performance.

Domain Adaptation Language Modelling +6

Paper
Add Code

Bilingual Segmented Topic Model

no code implementations • ACL 2016 • Akihiro Tamura, Eiichiro Sumita

Topic Models

Paper
Add Code

Agreement on Target-bidirectional Neural Machine Translation

no code implementations • NAACL 2016 • Lemao Liu, Masao Utiyama, Andrew Finch, Eiichiro Sumita

Structured Prediction Translation +1

Paper
Add Code

Interlocking Phrases in Phrase-based Statistical Machine Translation

no code implementations • NAACL 2016 • Ye Kyaw Thu, Andrew Finch, Eiichiro Sumita

Language Modelling Machine Translation +1

Paper
Add Code

Neural Machine Translation with Source Dependency Representation

no code implementations • EMNLP 2017 • Kehai Chen, Rui Wang, Masao Utiyama, Lemao Liu, Akihiro Tamura, Eiichiro Sumita, Tiejun Zhao

Source dependency information has been successfully introduced into statistical machine translation.

Machine Translation NMT +1

Paper
Add Code

Unsupervised Word Alignment by Agreement Under ITG Constraint

no code implementations • EMNLP 2016 • Hidetaka Kamigaito, Akihiro Tamura, Hiroya Takamura, Manabu Okumura, Eiichiro Sumita

Machine Translation Word Alignment

Paper
Add Code

Enhancement of Encoder and Attention Using Target Monolingual Corpora in Neural Machine Translation

no code implementations • WS 2018 • Kenji Imamura, Atsushi Fujita, Eiichiro Sumita

A large-scale parallel corpus is required to train encoder-decoder neural machine translation.

Language Modelling Machine Translation +2

Paper
Add Code

NICT Self-Training Approach to Neural Machine Translation at NMT-2018

no code implementations • WS 2018 • Kenji Imamura, Eiichiro Sumita

This paper describes the NICT neural machine translation system submitted at the NMT-2018 shared task.

Machine Translation NMT +1

Paper
Add Code

Japanese to English/Chinese/Korean Datasets for Translation Quality Estimation and Automatic Post-Editing

no code implementations • WS 2017 • Atsushi Fujita, Eiichiro Sumita

Aiming at facilitating the research on quality estimation (QE) and automatic post-editing (APE) of machine translation (MT) outputs, especially for those among Asian languages, we have created new datasets for Japanese to English, Chinese, and Korean translations.

Automatic Post-Editing Benchmarking +2

Paper
Add Code

Ensemble and Reranking: Using Multiple Models in the NICT-2 Neural Machine Translation System at WAT2017

no code implementations • WS 2017 • Kenji Imamura, Eiichiro Sumita

In this paper, we describe the NICT-2 neural machine translation system evaluated at WAT2017.

Machine Translation Small Data Image Classification +1

Paper
Add Code

A Simple and Strong Baseline: NAIST-NICT Neural Machine Translation System for WAT2017 English-Japanese Translation Task

no code implementations • WS 2017 • Yusuke Oda, Katsuhito Sudoh, Satoshi Nakamura, Masao Utiyama, Eiichiro Sumita

This paper describes the details about the NAIST-NICT machine translation system for WAT2017 English-Japanese Scientific Paper Translation Task.

Machine Translation Translation

Paper
Add Code

Target-Bidirectional Neural Models for Machine Transliteration

no code implementations • WS 2016 • Andrew Finch, Lemao Liu, Xiaolin Wang, Eiichiro Sumita

Transliteration

Paper
Add Code

Global Pre-ordering for Improving Sublanguage Translation

no code implementations • WS 2016 • Masaru Fuji, Masao Utiyama, Eiichiro Sumita, Yuji Matsumoto

When translating formal documents, capturing the sentence structure specific to the sublanguage is extremely necessary to obtain high-quality translations.

Machine Translation Sentence +1

Paper
Add Code

NICT-2 Translation System for WAT2016: Applying Domain Adaptation to Phrase-based Statistical Machine Translation

no code implementations • WS 2016 • Kenji Imamura, Eiichiro Sumita

This paper describes the NICT-2 translation system for the 3rd Workshop on Asian Translation.

Domain Adaptation Machine Translation +1

Paper
Add Code

An Efficient and Effective Online Sentence Segmenter for Simultaneous Interpretation

no code implementations • WS 2016 • Xiaolin Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Simultaneous interpretation is a very challenging application of machine translation in which the input is a stream of words from a speech recognition engine.

Automatic Speech Recognition (ASR) Machine Translation +5

Paper
Add Code

Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

no code implementations • WS 2016 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita

This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank.

Machine Translation Translation +1

Paper
Add Code

A Prototype Automatic Simultaneous Interpretation System

no code implementations • COLING 2016 • Xiaolin Wang, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Simultaneous interpretation allows people to communicate spontaneously across language boundaries, but such services are prohibitively expensive for the general public.

Paper
Add Code

MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation

no code implementations • COLING 2016 • Rei Miyata, Anthony Hartley, Kyo Kageura, C{\'e}cile Paris, Masao Utiyama, Eiichiro Sumita

The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts.

Machine Translation Translation

Paper
Add Code

Context-Aware Smoothing for Neural Machine Translation

no code implementations • IJCNLP 2017 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

In Neural Machine Translation (NMT), each word is represented as a low-dimension, real-value vector for encoding its syntax and semantic information.

Machine Translation NMT +3

Paper
Add Code

Key-value Attention Mechanism for Neural Machine Translation

no code implementations • IJCNLP 2017 • Hideya Mino, Masao Utiyama, Eiichiro Sumita, Takenobu Tokunaga

In this paper, we propose a neural machine translation (NMT) with a key-value attention mechanism on the source-side encoder.

Machine Translation NMT +1

Paper
Add Code

Multilingual Parallel Corpus for Global Communication Plan

no code implementations • LREC 2018 • Kenji Imamura, Eiichiro Sumita

Domain Adaptation Machine Translation +1

Paper
Add Code

Recurrent Neural Networks for Word Alignment Model

no code implementations • ACL 2014 • Akihiro Tamura, Taro Watanabe, Eiichiro Sumita

Machine Translation Word Alignment

Paper
Add Code

Dependency-based Pre-ordering for Chinese-English Machine Translation

no code implementations • ACL 2014 • Jingsheng Cai, Masao Utiyama, Eiichiro Sumita, Yu-Jie Zhang

Dependency Parsing Machine Translation +1

Paper
Add Code

Empirical Study of Unsupervised Chinese Word Segmentation Methods for SMT on Large-scale Corpora

no code implementations • ACL 2014 • Xiaolin Wang, Masao Utiyama, Andrew Finch, Eiichiro Sumita

Chinese Word Segmentation Machine Translation

Paper
Add Code

Distortion Model Considering Rich Context for Statistical Machine Translation

no code implementations • ACL 2013 • Isao Goto, Masao Utiyama, Eiichiro Sumita, Akihiro Tamura, Sadao Kurohashi

Language Modelling Machine Translation +1

Paper
Add Code

Additive Neural Networks for Statistical Machine Translation

no code implementations • ACL 2013 • Lemao Liu, Taro Watanabe, Eiichiro Sumita, Tiejun Zhao

Feature Engineering Language Modelling +2

Paper
Add Code

Hierarchical Phrase Table Combination for Machine Translation

no code implementations • ACL 2013 • Conghui Zhu, Taro Watanabe, Eiichiro Sumita, Tiejun Zhao

Domain Adaptation Machine Translation +1

Paper
Add Code

Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

no code implementations • ACL 2013 • Akihiro Tamura, Taro Watanabe, Eiichiro Sumita, Hiroya Takamura, Manabu Okumura

Constituency Parsing Dependency Parsing +2

Paper
Add Code

Post-ordering by Parsing for Japanese-English Statistical Machine Translation

no code implementations • ACL 2012 • Isao Goto, Masao Utiyama, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

Improving fast\_align by Reordering

no code implementations • EMNLP 2015 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita

Machine Translation Word Alignment

Paper
Add Code

Hierarchical Phrase-based Stream Decoding

no code implementations • EMNLP 2015 • Andrew Finch, Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

Machine Translation

Paper
Add Code

Hierarchical Back-off Modeling of Hiero Grammar based on Non-parametric Bayesian Model

no code implementations • EMNLP 2015 • Hidetaka Kamigaito, Taro Watanabe, Hiroya Takamura, Manabu Okumura, Eiichiro Sumita

Machine Translation

Paper
Add Code

Leave-one-out Word Alignment without Garbage Collector Effects

no code implementations • EMNLP 2015 • Xiaolin Wang, Masao Utiyama, Andrew Finch, Taro Watanabe, Eiichiro Sumita

Machine Translation Word Alignment

Paper
Add Code

A Binarized Neural Network Joint Model for Machine Translation

no code implementations • EMNLP 2015 • Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Graham Neubig, Satoshi Nakamura

Language Modelling Machine Translation +1

Paper
Add Code

Syntax-Augmented Machine Translation using Syntax-Label Clustering

no code implementations • EMNLP 2014 • Hideya Mino, Taro Watanabe, Eiichiro Sumita

Clustering Machine Translation +1

Paper
Add Code

Learning Hierarchical Translation Spans

no code implementations • EMNLP 2014 • Jingyi Zhang, Masao Utiyama, Eiichiro Sumita, Hai Zhao

Machine Translation Translation +1

Paper
Add Code

Neural Network Based Bilingual Language Model Growing for Statistical Machine Translation

no code implementations • EMNLP 2014 • Rui Wang, Hai Zhao, Bao-liang Lu, Masao Utiyama, Eiichiro Sumita

Language Modelling Machine Translation +1

Paper
Add Code

Refining Word Segmentation Using a Manually Aligned Corpus for Statistical Machine Translation

no code implementations • EMNLP 2014 • Xiaolin Wang, Masao Utiyama, Andrew Finch, Eiichiro Sumita

Machine Translation Translation +1

Paper
Add Code

Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

no code implementations • EMNLP 2012 • Akihiro Tamura, Taro Watanabe, Eiichiro Sumita

Information Retrieval Machine Translation

Paper
Add Code

Neural Network Transduction Models in Transliteration Generation

no code implementations • WS 2015 • Andrew Finch, Lemao Liu, Xiaolin Wang, Eiichiro Sumita

Language Modelling Transliteration

Paper
Add Code

Proceedings of the 2nd Workshop on Asian Translation (WAT2015)

no code implementations • WS 2015 • Toshiaki Nakazawa, Hideya Mino, Isao Goto, Graham Neubig, Sadao Kurohashi, Eiichiro Sumita

Translation

Paper
Add Code

Overview of the 2nd Workshop on Asian Translation

no code implementations • WS 2015 • Toshiaki Nakazawa, Hideya Mino, Isao Goto, Graham Neubig, Sadao Kurohashi, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

NICT at WAT 2015

no code implementations • WS 2015 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita

Machine Translation

Paper
Add Code

Integrating Dictionaries into an Unsupervised Model for Myanmar Word Segmentation

no code implementations • WS 2014 • Ye Kyaw Thu, Andrew Finch, Eiichiro Sumita, Yoshinori Sagisaka

Language Modelling Machine Translation

Paper
Add Code

Overview of the 1st Workshop on Asian Translation

no code implementations • WS 2014 • Toshiaki Nakazawa, Hideya Mino, Isao Goto, Sadao Kurohashi, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

Word Order Does NOT Differ Significantly Between Chinese and Japanese

no code implementations • WS 2014 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita, Mikio Yamamoto

Machine Translation

Paper
Add Code

Rescoring a Phrase-based Machine Transliteration System with Recurrent Neural Network Language Models

no code implementations • WS 2012 • Andrew Finch, Paul Dixon, Eiichiro Sumita

Transliteration

Paper
Add Code

Tuning SMT with a Large Number of Features via Online Feature Grouping

no code implementations • IJCNLP 2013 • Eiichiro Sumita, Tiejun Zhao, Taro Watanabe, Lemao Liu

Clustering Machine Translation +1

Paper
Add Code

A Large-scale Study of Statistical Machine Translation Methods for Khmer Language

no code implementations • PACLIC 2015 • Ye Kyaw Thu, Vichet Chea, Andrew Finch, Masao Utiyama, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

Improving Neural Machine Translation with Neural Syntactic Distance

no code implementations • NAACL 2019 • Chunpeng Ma, Akihiro Tamura, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

The explicit use of syntactic information has been proved useful for neural machine translation (NMT).

Machine Translation NMT +2

Paper
Add Code

Introducing the Asian Language Treebank (ALT)

no code implementations • LREC 2016 • Ye Kyaw Thu, Win Pa Pa, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The project has so far created a corpus for Myanmar and will extend in scope to include other languages in the near future.

Sentence Translation

Paper
Add Code

ASPEC: Asian Scientific Paper Excerpt Corpus

no code implementations • LREC 2016 • Toshiaki Nakazawa, Manabu Yaguchi, Kiyotaka Uchimoto, Masao Utiyama, Eiichiro Sumita, Sadao Kurohashi, Hitoshi Isahara

In this paper, we describe the details of the ASPEC (Asian Scientific Paper Excerpt Corpus), which is the first large-size parallel corpus of scientific paper domain.

Machine Translation Translation

Paper
Add Code

Unsupervised Bilingual Word Embedding Agreement for Unsupervised Neural Machine Translation

no code implementations • ACL 2019 • Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

In previous methods, UBWE is first trained using non-parallel monolingual corpora and then this pre-trained UBWE is used to initialize the word embedding in the encoder and decoder of UNMT.

Denoising Machine Translation +1

Paper
Add Code

Neural Machine Translation with Reordering Embeddings

no code implementations • ACL 2019 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

The reordering model plays an important role in phrase-based statistical machine translation.

Machine Translation Sentence +1

Paper
Add Code

Sentence-Level Agreement for Neural Machine Translation

no code implementations • ACL 2019 • Mingming Yang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Min Zhang, Tiejun Zhao

The training objective of neural machine translation (NMT) is to minimize the loss between the words in the translated sentences and those in the references.

Machine Translation NMT +2

Paper
Add Code

NICT's Supervised Neural Machine Translation Systems for the WMT19 News Translation Task

no code implementations • WS 2019 • Raj Dabre, Kehai Chen, Benjamin Marie, Rui Wang, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

In this paper, we describe our supervised neural machine translation (NMT) systems that we developed for the news translation task for Kazakh↔English, Gujarati↔English, Chinese↔English, and English→Finnish translation directions.

Machine Translation NMT +2

Paper
Add Code

NICT's Unsupervised Neural and Statistical Machine Translation Systems for the WMT19 News Translation Task

no code implementations • WS 2019 • Benjamin Marie, Haipeng Sun, Rui Wang, Kehai Chen, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s participation in the WMT19 unsupervised news translation task.

Translation Unsupervised Machine Translation

Paper
Add Code

NICT's Supervised Neural Machine Translation Systems for the WMT19 Translation Robustness Task

no code implementations • WS 2019 • Raj Dabre, Eiichiro Sumita

al., 2017) to improve translation quality for Japanese↔English.

Domain Adaptation Machine Translation +3

Paper
Add Code

Revisiting Simple Domain Adaptation Methods in Unsupervised Neural Machine Translation

no code implementations • 26 Aug 2019 • Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao, Chenhui Chu

However, it has not been well-studied for unsupervised neural machine translation (UNMT), although UNMT has recently achieved remarkable results in several domain-specific language pairs.

Domain Adaptation Machine Translation +1

Paper
Add Code

Hybrid Data-Model Parallel Training for Sequence-to-Sequence Recurrent Neural Network Machine Translation

no code implementations • WS 2019 • Junya Ono, Masao Utiyama, Eiichiro Sumita

We apply a model parallel approach to the RNN encoder-decoder part of the Seq2Seq model and a data parallel approach to the attention-softmax part of the model.

Machine Translation Translation

Paper
Add Code

Document-level Neural Machine Translation with Associated Memory Network

no code implementations • 31 Oct 2019 • Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, Hai Zhao, Bao-liang Lu

Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network.

Machine Translation NMT +2

Paper
Add Code

Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English

no code implementations • WS 2019 • Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks.

NMT Translation +1

Paper
Add Code

NICT's participation to WAT 2019: Multilingualism and Multi-step Fine-Tuning for Low Resource NMT

no code implementations • WS 2019 • Raj Dabre, Eiichiro Sumita

In this paper we describe our submissions to WAT 2019 for the following tasks: English{--}Tamil translation and Russian{--}Japanese translation.

Domain Adaptation NMT +1

Paper
Add Code

English-Myanmar Supervised and Unsupervised NMT: NICT's Machine Translation Systems at WAT-2019

no code implementations • WS 2019 • Rui Wang, Haipeng Sun, Kehai Chen, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions.

Language Modelling Machine Translation +2

Paper
Add Code

Long Warm-up and Self-Training: Training Strategies of NICT-2 NMT System at WAT-2019

no code implementations • WS 2019 • Kenji Imamura, Eiichiro Sumita

This paper describes the NICT-2 neural machine translation system at the 6th Workshop on Asian Translation.

Machine Translation NMT +1

Paper
Add Code

Recycling a Pre-trained BERT Encoder for Neural Machine Translation

no code implementations • WS 2019 • Kenji Imamura, Eiichiro Sumita

In this paper, a pre-trained Bidirectional Encoder Representations from Transformers (BERT) model is applied to Transformer-based neural machine translation (NMT).

Machine Translation NMT +1

Paper
Add Code

SJTU-NICT at MRP 2019: Multi-Task Learning for End-to-End Uniform Semantic Graph Parsing

no code implementations • CONLL 2019 • Zuchao Li, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

This paper describes our SJTU-NICT{'}s system for participating in the shared task on Cross-Framework Meaning Representation Parsing (MRP) at the 2019 Conference for Computational Language Learning (CoNLL).

Multi-Task Learning

Paper
Add Code

Recurrent Positional Embedding for Neural Machine Translation

no code implementations • IJCNLP 2019 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

To address this issue, this work proposes a recurrent positional embedding approach based on word vector.

Machine Translation Translation

Paper
Add Code

MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method

no code implementations • IJCNLP 2019 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita

MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription.

Paper
Add Code

Probing Contextualized Sentence Representations with Visual Awareness

no code implementations • 7 Nov 2019 • Zhuosheng Zhang, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Hai Zhao

We present a universal framework to model contextualized sentence representations with visual awareness that is motivated to overcome the shortcomings of the multimodal parallel data with manual annotations.

Machine Translation Natural Language Inference +2

Paper
Add Code

Online Sentence Segmentation for Simultaneous Interpretation using Multi-Shifted Recurrent Neural Network

no code implementations • WS 2019 • Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

Sentence Sentence segmentation

Paper
Add Code

Pre-training via Leveraging Assisting Languages and Data Selection for Neural Machine Translation

no code implementations • 23 Jan 2020 • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

To this end, we propose to exploit monolingual corpora of other languages to complement the scarcity of monolingual corpora for the LOI.

Machine Translation NMT +1

Paper
Add Code

Modeling Future Cost for Neural Machine Translation

no code implementations • 28 Feb 2020 • Chaoqun Duan, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Conghui Zhu, Tiejun Zhao

Existing neural machine translation (NMT) systems utilize sequence-to-sequence neural networks to generate target translation word by word, and then make the generated word at each time-step and the counterpart in the references as consistent as possible.

Machine Translation NMT +1

Paper
Add Code

Robust Unsupervised Neural Machine Translation with Adversarial Denoising Training

no code implementations • COLING 2020 • Haipeng Sun, Rui Wang, Kehai Chen, Xugang Lu, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

Unsupervised neural machine translation (UNMT) has recently attracted great interest in the machine translation community.

Denoising Machine Translation +1

Paper
Add Code

Explicit Reordering for Neural Machine Translation

no code implementations • 8 Apr 2020 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

Thus, we propose a novel reordering method to explicitly model this reordering information for the Transformer-based NMT.

Machine Translation NMT +2

Paper
Add Code

Self-Training for Unsupervised Neural Machine Translation in Unbalanced Training Data Scenarios

no code implementations • NAACL 2021 • Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

Unsupervised neural machine translation (UNMT) that relies solely on massive monolingual corpora has achieved remarkable results in several translation tasks.

Machine Translation Translation

Paper
Add Code

Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation

no code implementations • ACL 2020 • Haipeng Sun, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao

Unsupervised neural machine translation (UNMT) has recently achieved remarkable results for several language pairs.

Knowledge Distillation Machine Translation +1

Paper
Add Code

A Myanmar (Burmese)-English Named Entity Transliteration Dictionary

no code implementations • LREC 2020 • Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita

For the Myanmar (Burmese) language, robust automatic transliteration for borrowed English words is a challenging task because of the complex Myanmar writing system and the lack of data.

Transliteration

Paper
Add Code

Content Word Aware Neural Machine Translation

no code implementations • ACL 2020 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

Neural machine translation (NMT) encodes the source sentence in a universal way to generate the target sentence word-by-word.

Machine Translation NMT +2

Paper
Add Code

A Three-Parameter Rank-Frequency Relation in Natural Languages

no code implementations • ACL 2020 • Chenchen Ding, Masao Utiyama, Eiichiro Sumita

We present that, the rank-frequency relation in textual data follows $f \propto r^{-\alpha}(r+\gamma)^{-\beta}$, where $f$ is the token frequency and $r$ is the rank by frequency, with ($\alpha$, $\beta$, $\gamma$) as parameters.

Relation

Paper
Add Code

Pre-training via Leveraging Assisting Languages for Neural Machine Translation

no code implementations • ACL 2020 • Haiyue Song, Raj Dabre, Zhuoyuan Mao, Fei Cheng, Sadao Kurohashi, Eiichiro Sumita

Sequence-to-sequence (S2S) pre-training using large monolingual data is known to improve performance for various S2S NLP tasks.

Machine Translation NMT +1

Paper
Add Code

A System for Worldwide COVID-19 Information Aggregation

no code implementations • EMNLP (NLP-COVID19) 2020 • Akiko Aizawa, Frederic Bergeron, Junjie Chen, Fei Cheng, Katsuhiko Hayashi, Kentaro Inui, Hiroyoshi Ito, Daisuke Kawahara, Masaru Kitsuregawa, Hirokazu Kiyomaru, Masaki Kobayashi, Takashi Kodama, Sadao Kurohashi, Qianying Liu, Masaki Matsubara, Yusuke Miyao, Atsuyuki Morishima, Yugo Murawaki, Kazumasa Omura, Haiyue Song, Eiichiro Sumita, Shinji Suzuki, Ribeka Tanaka, Yu Tanaka, Masashi Toyoda, Nobuhiro Ueda, Honai Ueoka, Masao Utiyama, Ying Zhong

The global pandemic of COVID-19 has made the public pay close attention to related news, covering various domains, such as sanitation, treatment, and effects on education.

Machine Translation Translation

Paper
Add Code

Data-dependent Gaussian Prior Objective for Language Generation

no code implementations • ICLR 2020 • Zuchao Li, Rui Wang, Kehai Chen, Masso Utiyama, Eiichiro Sumita, Zhuosheng Zhang, Hai Zhao

However, MLE focuses on once-to-all matching between the predicted sequence and gold-standard, consequently treating all incorrect predictions as being equally incorrect.

Image Captioning L2 Regularization +4

Paper
Add Code

Prior Knowledge Representation for Self-Attention Networks

no code implementations • 1 Jan 2021 • Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita

Self-attention networks (SANs) have shown promising empirical results in various natural language processing tasks.

Translation

Paper
Add Code

Cross-lingual Transfer Learning for Pre-trained Contextualized Language Models

no code implementations • 1 Jan 2021 • Zuchao Li, Kevin Barry Parnow, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant.

Cross-Lingual Transfer Language Modelling +3

Paper
Add Code

SJTU-NICT's Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

no code implementations • 11 Oct 2020 • Zuchao Li, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

In this paper, we introduced our joint team SJTU-NICT 's participation in the WMT 2020 machine translation shared task.

Collaborative Filtering Language Modelling +3

Paper
Add Code

Text Compression-aided Transformer Encoding

no code implementations • 11 Feb 2021 • Zuchao Li, Zhuosheng Zhang, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

In this paper, we propose explicit and implicit text compression approaches to enhance the Transformer encoding and evaluate models using this approach on several typical downstream tasks that rely on the encoding heavily.

Text Compression

Paper
Add Code

Cross-lingual Transferring of Pre-trained Contextualized Language Models

no code implementations • 27 Jul 2021 • Zuchao Li, Kevin Parnow, Hai Zhao, Zhuosheng Zhang, Rui Wang, Masao Utiyama, Eiichiro Sumita

Language Modelling Machine Translation +1

Paper
Add Code

YANMTT: Yet Another Neural Machine Translation Toolkit

no code implementations • 25 Aug 2021 • Raj Dabre, Eiichiro Sumita

In this paper we present our open-source neural machine translation (NMT) toolkit called "Yet Another Neural Machine Translation Toolkit" abbreviated as YANMTT which is built on top of the Transformers library.

Machine Translation Model Compression +3

Paper
Add Code

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

no code implementations • COLING 2020 • Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data.

Machine Translation NMT +1

Paper
Add Code

Bilingual Subword Segmentation for Neural Machine Translation

no code implementations • COLING 2020 • Hiroyuki Deguchi, Masao Utiyama, Akihiro Tamura, Takashi Ninomiya, Eiichiro Sumita

This paper proposed a new subword segmentation method for neural machine translation, {``}Bilingual Subword Segmentation,{''} which tokenizes sentences to minimize the difference between the number of subword units in a sentence and that of its translation.

Machine Translation Segmentation +2

Paper
Add Code

Intermediate Self-supervised Learning for Machine Translation Quality Estimation

no code implementations • COLING 2020 • Raphael Rubino, Eiichiro Sumita

The proposed method does not rely on annotated data and is complementary to QE methods involving pre-trained sentence encoders and domain adaptation.

Domain Adaptation Language Modelling +4

Paper
Add Code

NICT’s Neural Machine Translation Systems for the WAT21 Restricted Translation Task

no code implementations • ACL (WAT) 2021 • Zuchao Li, Masao Utiyama, Eiichiro Sumita, Hai Zhao

This paper describes our system (Team ID: nictrb) for participating in the WAT’21 restricted machine translation task.

Machine Translation Translation

Paper
Add Code

NICT-2 Translation System at WAT-2021: Applying a Pretrained Multilingual Encoder-Decoder Model to Low-resource Language Pairs

no code implementations • ACL (WAT) 2021 • Kenji Imamura, Eiichiro Sumita

In this paper, we present the NICT system (NICT-2) submitted to the NICT-SAP shared task at the 8th Workshop on Asian Translation (WAT-2021).

Translation

Paper
Add Code

Multi-domain Adaptation for Statistical Machine Translation Based on Feature Augmentation

no code implementations • AMTA 2016 • Kenji Imamura, Eiichiro Sumita

In this paper, we present domain adaptation methods for machine translation that assume multiple domains.

Domain Adaptation Machine Translation +1

Paper
Add Code

Unsupervised Neural Machine Translation with Universal Grammar

no code implementations • EMNLP 2021 • Zuchao Li, Masao Utiyama, Eiichiro Sumita, Hai Zhao

Machine translation usually relies on parallel corpora to provide parallel signals for training.

Translation Unsupervised Machine Translation

Paper
Add Code

MiSS: An Assistant for Multi-Style Simultaneous Translation

no code implementations • EMNLP (ACL) 2021 • Zuchao Li, Kevin Parnow, Masao Utiyama, Eiichiro Sumita, Hai Zhao

With this system, we aim to provide a complete translation experience for machine translation users.

Grammatical Error Correction Machine Translation +1

Paper
Add Code

Transformer-based Double-token Bidirectional Autoregressive Decoding in Neural Machine Translation

no code implementations • AACL (WAT) 2020 • Kenji Imamura, Eiichiro Sumita

This paper presents a simple method that extends a standard Transformer-based autoregressive decoder, to speed up decoding.

Machine Translation Sentence +1

Paper
Add Code

A Text Editing Approach to Joint Japanese Word Segmentation, POS Tagging, and Lexical Normalization

no code implementations • WNUT (ACL) 2021 • Shohei Higashiyama, Masao Utiyama, Taro Watanabe, Eiichiro Sumita

Lexical normalization, in addition to word segmentation and part-of-speech tagging, is a fundamental task for Japanese user-generated text processing.

Japanese Word Segmentation Lexical Normalization +3

Paper
Add Code

NICT’s Participation in WAT 2018: Approaches Using Multilingualism and Recurrently Stacked Layers

no code implementations • PACLIC 2018 • Raj Dabre, Anoop Kunchukuttan, Atsushi Fujita, Eiichiro Sumita

Paper
Add Code

English-Myanmar NMT and SMT with Pre-ordering: NICT’s Machine Translation Systems at WAT-2018

no code implementations • PACLIC 2018 • Rui Wang, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

Machine Translation NMT +1

Paper
Add Code

Combination of Statistical and Neural Machine Translation for Myanmar-English

no code implementations • PACLIC 2018 • Benjamin Marie, Atsushi Fujita, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

SJTU-NICT’s Supervised and Unsupervised Neural Machine Translation Systems for the WMT20 News Translation Task

no code implementations • WMT (EMNLP) 2020 • Zuchao Li, Hai Zhao, Rui Wang, Kehai Chen, Masao Utiyama, Eiichiro Sumita

In this paper, we introduced our joint team SJTU-NICT ‘s participation in the WMT 2020 machine translation shared task.

Collaborative Filtering Language Modelling +3

Paper
Add Code

MiSS@WMT21: Contrastive Learning-reinforced Domain Adaptation in Neural Machine Translation

no code implementations • WMT (EMNLP) 2021 • Zuchao Li, Masao Utiyama, Eiichiro Sumita, Hai Zhao

In this paper, we describe our MiSS system that participated in the WMT21 news translation task.

Contrastive Learning Domain Adaptation +2

Paper
Add Code

What Works and Doesn’t Work, A Deep Decoder for Neural Machine Translation

no code implementations • Findings (ACL) 2022 • Zuchao Li, Yiran Wang, Masao Utiyama, Eiichiro Sumita, Hai Zhao, Taro Watanabe

Inspired by this discovery, we then propose approaches to improving it, with respect to model structure and model training, to make the deep decoder practical in NMT.

Language Modelling Machine Translation +2

Paper
Add Code

Synchronous Refinement for Neural Machine Translation

no code implementations • Findings (ACL) 2022 • Kehai Chen, Masao Utiyama, Eiichiro Sumita, Rui Wang, Min Zhang

Machine translation typically adopts an encoder-to-decoder framework, in which the decoder generates the target sentence word-by-word in an auto-regressive manner.

Machine Translation Sentence +1

Paper
Add Code

Restricted or Not: A General Training Framework for Neural Machine Translation

no code implementations • ACL 2022 • Zuchao Li, Masao Utiyama, Eiichiro Sumita, Hai Zhao

Although this can satisfy the requirements overall, it usually requires a larger beam size and far longer decoding time than unrestricted translation, which limits the concurrent processing ability of the translation model in deployment, and thus its practicality.

Machine Translation Translation

Paper
Add Code

Empirical Study of Dropout Scheme for Neural Machine Translation

no code implementations • MTSummit 2017 • Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

A Target Attention Model for Neural Machine Translation

no code implementations • MTSummit 2017 • Hideya Mino, Andrew Finch, Eiichiro Sumita

Machine Translation Translation

Paper
Add Code

A Multimodal Simultaneous Interpretation Prototype: Who Said What

no code implementations • AMTA 2022 • Xiaolin Wang, Masao Utiyama, Eiichiro Sumita

“Who said what” is essential for users to understand video streams that have more than one speaker, but conventional simultaneous interpretation systems merely present “what was said” in the form of subtitles.

Sentence TAG +1

Paper
Add Code

FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT

no code implementations • COLING 2022 • Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Eiichiro Sumita

In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART).

LEMMA NMT

Paper
Add Code

Multi-Source Cross-Lingual Constituency Parsing

no code implementations • ICON 2021 • Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, Satoshi Nakamura

Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information.

Constituency Parsing Cross-Lingual Transfer +1