Search Results for author: Chenchen Ding

Found 29 papers, 3 papers with code

FeatureBART: Feature Based Sequence-to-Sequence Pre-Training for Low-Resource NMT

no code implementations COLING 2022 Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Hideki Tanaka, Masao Utiyama, Eiichiro Sumita

In this paper we present FeatureBART, a linguistically motivated sequence-to-sequence monolingual pre-training strategy in which syntactic features such as lemma, part-of-speech and dependency labels are incorporated into the span prediction based pre-training framework (BART).

LEMMA Low Resource NMT +1

Multi-Source Cross-Lingual Constituency Parsing

no code implementations ICON 2021 Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, Satoshi Nakamura

Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information.

Constituency Parsing Cross-Lingual Transfer +1

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

1 code implementation12 Jun 2024 Zhi Qu, Chenchen Ding, Taro Watanabe

Understanding representation transfer in multilingual neural machine translation can reveal the representational issue causing the zero-shot translation deficiency.

Contrastive Learning Decoder +3

Outlier-Aware Training for Low-Bit Quantization of Structural Re-Parameterized Networks

no code implementations11 Feb 2024 Muqun Niu, Yuan Ren, Boyu Li, Chenchen Ding

Lightweight design of Convolutional Neural Networks (CNNs) requires co-design efforts in the model architectures and compression techniques.

Quantization

A Crucial Parameter for Rank-Frequency Relation in Natural Languages

no code implementations1 Feb 2024 Chenchen Ding

$f \propto r^{-\alpha} \cdot (r+\gamma)^{-\beta}$ has been empirically shown more precise than a na\"ive power law $f\propto r^{-\alpha}$ to model the rank-frequency ($r$-$f$) relation of words in natural languages.

Relation

A Two Parameters Equation for Word Rank-Frequency Relation

no code implementations2 May 2022 Chenchen Ding

Let $f (\cdot)$ be the absolute frequency of words and $r$ be the rank of words in decreasing order of frequency, then the following function can fit the rank-frequency relation \[ f (r;s, t) = \left(\frac{r_{\tt max}}{r}\right)^{1-s} \left(\frac{r_{\tt max}+t \cdot r_{\tt exp}}{r+t \cdot r_{\tt exp}}\right)^{1+(1+t)s} \] where $r_{\tt max}$ and $r_{\tt exp}$ are the maximum and the expectation of the rank, respectively; $s>0$ and $t>0$ are parameters estimated from data.

Relation Vocal Bursts Valence Prediction

Transliteration of Foreign Words in Burmese

no code implementations7 Oct 2021 Chenchen Ding

This manuscript provides general descriptions on transliteration of foreign words in the Burmese language.

Transliteration

Improving Low-Resource NMT through Relevance Based Linguistic Features Incorporation

no code implementations COLING 2020 Abhisek Chakrabarty, Raj Dabre, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

In this study, linguistic knowledge at different levels are incorporated into the neural machine translation (NMT) framework to improve translation quality for language pairs with extremely limited data.

Low Resource NMT NMT +1

A Three-Parameter Rank-Frequency Relation in Natural Languages

no code implementations ACL 2020 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

We present that, the rank-frequency relation in textual data follows $f \propto r^{-\alpha}(r+\gamma)^{-\beta}$, where $f$ is the token frequency and $r$ is the rank by frequency, with ($\alpha$, $\beta$, $\gamma$) as parameters.

Relation

A Myanmar (Burmese)-English Named Entity Transliteration Dictionary

no code implementations LREC 2020 Aye Myat Mon, Chenchen Ding, Hour Kaing, Khin Mar Soe, Masao Utiyama, Eiichiro Sumita

For the Myanmar (Burmese) language, robust automatic transliteration for borrowed English words is a challenging task because of the complex Myanmar writing system and the lack of data.

Transliteration

English-Myanmar Supervised and Unsupervised NMT: NICT's Machine Translation Systems at WAT-2019

no code implementations WS 2019 Rui Wang, Haipeng Sun, Kehai Chen, Chenchen Ding, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s participation (team ID: NICT) in the 6th Workshop on Asian Translation (WAT-2019) shared translation task, specifically Myanmar (Burmese) - English task in both translation directions.

Language Modelling Machine Translation +2

Overview of the 6th Workshop on Asian Translation

no code implementations WS 2019 Toshiaki Nakazawa, Nobushige Doi, Shohei Higashiyama, Chenchen Ding, Raj Dabre, Hideya Mino, Isao Goto, Win Pa Pa, Anoop Kunchukuttan, Yusuke Oda, Shantipriya Parida, Ond{\v{r}}ej Bojar, Sadao Kurohashi

This paper presents the results of the shared tasks from the 6th workshop on Asian translation (WAT2019) including Ja↔En, Ja↔Zh scientific paper translation subtasks, Ja↔En, Ja↔Ko, Ja↔En patent translation subtasks, Hi↔En, My↔En, Km↔En, Ta↔En mixed domain subtasks and Ru↔Ja news commentary translation task.

Translation

Supervised and Unsupervised Machine Translation for Myanmar-English and Khmer-English

no code implementations WS 2019 Benjamin Marie, Hour Kaing, Aye Myat Mon, Chenchen Ding, Atsushi Fujita, Masao Utiyama, Eiichiro Sumita

This paper presents the NICT{'}s supervised and unsupervised machine translation systems for the WAT2019 Myanmar-English and Khmer-English translation tasks.

NMT Translation +1

MY-AKKHARA: A Romanization-based Burmese (Myanmar) Input Method

no code implementations IJCNLP 2019 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

MY-AKKHARA is a method used to input Burmese texts encoded in the Unicode standard, based on commonly accepted Latin transcription.

Simplified Abugidas

no code implementations ACL 2018 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

An abugida is a writing system where the consonant letters represent syllables with a default vowel and other vowels are denoted by diacritics.

Sentence

Similar Southeast Asian Languages: Corpus-Based Case Study on Thai-Laotian and Malay-Indonesian

no code implementations WS 2016 Chenchen Ding, Masao Utiyama, Eiichiro Sumita

This paper illustrates the similarity between Thai and Laotian, and between Malay and Indonesian, based on an investigation on raw parallel data from Asian Language Treebank.

Machine Translation Translation +1

Khmer Word Segmentation Using Conditional Random Fields

1 code implementation15 Oct 2015 Vichet Chea, Ye Kyaw Thu, Chenchen Ding, Masao Utiyama, Andrew Finch, Eiichiro Sumita

The trained CRF segmenter was compared empirically to a baseline approach based on maximum matching that used a dictionary extracted from the manually segmented corpus.

Segmentation Text Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.