Search Results for author: Weihua Luo

Found 41 papers, 14 papers with code

Towards Enhancing Faithfulness for Neural Machine Translation

no code implementations • EMNLP 2020 • Rongxiang Weng, Heng Yu, Xiangpeng Wei, Weihua Luo

Neural machine translation (NMT) has achieved great success due to the ability to generate high-quality sentences.

Machine Translation Multi-Task Learning +2

Paper
Add Code

RoBLEURT Submission for WMT2021 Metrics Task

no code implementations • WMT (EMNLP) 2021 • Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao

After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model with massive synthetic data pairs, and 3) fine-tuning the model with data denoising strategy.

Denoising

Paper
Add Code

QEMind: Alibaba’s Submission to the WMT21 Quality Estimation Shared Task

no code implementations • WMT (EMNLP) 2021 • Jiayi Wang, Ke Wang, Boxing Chen, Yu Zhao, Weihua Luo, Yuqi Zhang

Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years.

Machine Translation Sentence +1

Paper
Add Code

TermMind: Alibaba’s WMT21 Machine Translation Using Terminologies Task Submission

no code implementations • WMT (EMNLP) 2021 • Ke Wang, Shuqin Gu, Boxing Chen, Yu Zhao, Weihua Luo, Yuqi Zhang

We used tags to mark and add the term translations into the matched sentences.

Data Augmentation Machine Translation +1

Paper
Add Code

RoBLEURT Submission for the WMT2021 Metrics Task

no code implementations • 28 Apr 2022 • Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao

Denoising

Paper
Add Code

Learning to Generalize to More: Continuous Semantic Augmentation for Neural Machine Translation

2 code implementations • ACL 2022 • Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Weihua Luo, Jun Xie, Rong Jin

Although data augmentation is widely used to enrich the training data, conventional methods with discrete manipulations fail to generate diverse and faithful training samples.

Data Augmentation Machine Translation +3

6,020

Paper
Code

QEMind: Alibaba's Submission to the WMT21 Quality Estimation Shared Task

no code implementations • 30 Dec 2021 • Jiayi Wang, Ke Wang, Boxing Chen, Yu Zhao, Weihua Luo, Yuqi Zhang

Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years.

Machine Translation Sentence +1

Paper
Add Code

KGR^4: Retrieval, Retrospect, Refine and Rethink for Commonsense Generation

1 code implementation • 15 Dec 2021 • Xin Liu, Dayiheng Liu, Baosong Yang, Haibo Zhang, Junwei Ding, Wenqing Yao, Weihua Luo, Haiying Zhang, Jinsong Su

Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently.

Retrieval Sentence

Paper
Code

Non-Parametric Unsupervised Domain Adaptation for Neural Machine Translation

1 code implementation • Findings (EMNLP) 2021 • Xin Zheng, Zhirui Zhang, ShuJian Huang, Boxing Chen, Jun Xie, Weihua Luo, Jiajun Chen

Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining.

Machine Translation NMT +3

Paper
Code

Rethinking Zero-shot Neural Machine Translation: From a Perspective of Latent Variables

1 code implementation • Findings (EMNLP) 2021 • Weizhi Wang, Zhirui Zhang, Yichao Du, Boxing Chen, Jun Xie, Weihua Luo

However, it usually suffers from capturing spurious correlations between the output language and language invariant semantics due to the maximum likelihood training objective, leading to poor transfer performance on zero-shot translation.

Denoising Machine Translation +2

Paper
Code

Task-Oriented Dialogue System as Natural Language Generation

1 code implementation • 31 Aug 2021 • Weizhi Wang, Zhirui Zhang, Junliang Guo, Yinpei Dai, Boxing Chen, Weihua Luo

In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing.

Text Generation Transfer Learning

Paper
Code

Breaking the Corpus Bottleneck for Context-Aware Neural Machine Translation with Cross-Task Pre-training

no code implementations • ACL 2021 • Linqing Chen, Junhui Li, ZhengXian Gong, Boxing Chen, Weihua Luo, Min Zhang, Guodong Zhou

To this end, we propose two pre-training tasks.

Machine Translation NMT +2

Paper
Add Code

Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation

no code implementations • ACL 2021 • Xin Liu, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Min Zhang, Haiying Zhang, Jinsong Su

A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary.

Text Generation

Paper
Add Code

Towards User-Driven Neural Machine Translation

1 code implementation • ACL 2021 • Huan Lin, Liang Yao, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Degen Huang, Jinsong Su

Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus.

Contrastive Learning Machine Translation +3

Paper
Code

Combining Static Word Embeddings and Contextual Representations for Bilingual Lexicon Induction

1 code implementation • Findings (ACL) 2021 • Jinpeng Zhang, Baijun Ji, Nini Xiao, Xiangyu Duan, Min Zhang, Yangbin Shi, Weihua Luo

Bilingual Lexicon Induction (BLI) aims to map words in one language to their translations in another, and is typically through learning linear projections to align monolingual word representation spaces.

Bilingual Lexicon Induction Word Embeddings

Paper
Code

Context-Interactive Pre-Training for Document Machine Translation

no code implementations • NAACL 2021 • Pengcheng Yang, Pei Zhang, Boxing Chen, Jun Xie, Weihua Luo

Document machine translation aims to translate the source sentence into the target language in the presence of additional contextual information.

Machine Translation Sentence +1

Paper
Add Code

G-Transformer for Document-level Machine Translation

1 code implementation • ACL 2021 • Guangsheng Bao, Yue Zhang, Zhiyang Teng, Boxing Chen, Weihua Luo

However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail.

Document Level Machine Translation Inductive Bias +3

Paper
Code

Adaptive Nearest Neighbor Machine Translation

3 code implementations • ACL 2021 • Xin Zheng, Zhirui Zhang, Junliang Guo, ShuJian Huang, Boxing Chen, Weihua Luo, Jiajun Chen

On four benchmark machine translation datasets, we demonstrate that the proposed method is able to effectively filter out the noises in retrieval results and significantly outperforms the vanilla kNN-MT model.

Machine Translation NMT +2

Paper
Code

Towards Variable-Length Textual Adversarial Attacks

no code implementations • 16 Apr 2021 • Junliang Guo, Zhirui Zhang, Linlin Zhang, Linli Xu, Boxing Chen, Enhong Chen, Weihua Luo

In this way, our approach is able to more comprehensively find adversarial examples around the decision boundary and effectively conduct adversarial attacks.

Machine Translation Translation

Paper
Add Code

Translation Memory Guided Neural Machine Translation

no code implementations • 1 Jan 2021 • Shaohui Kuang, Heng Yu, Weihua Luo, Qiang Wang

Existing ways either employ extra encoder to encode information from TM or concatenate source sentence and TM sentences as encoder's input.

Language Modelling Machine Translation +4

Paper
Add Code

Domain Transfer based Data Augmentation for Neural Query Translation

no code implementations • COLING 2020 • Liang Yao, Baosong Yang, Haibo Zhang, Boxing Chen, Weihua Luo

Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR).

Cross-Lingual Information Retrieval Data Augmentation +3

Paper
Add Code

Factorized Transformer for Multi-Domain Neural Machine Translation

no code implementations • Findings of the Association for Computational Linguistics 2020 • Yongchao Deng, Hongfei Yu, Heng Yu, Xiangyu Duan, Weihua Luo

Multi-Domain Neural Machine Translation (NMT) aims at building a single system that performs well on a range of target domains.

Machine Translation NMT +1

Paper
Add Code

Constraint Translation Candidates: A Bridge between Neural Query Translation and Cross-lingual Information Retrieval

no code implementations • 26 Oct 2020 • Tianchi Bi, Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen

Query translation (QT) is a key component in cross-lingual information retrieval system (CLIR).

Cross-Lingual Information Retrieval Machine Translation +3

Paper
Add Code

Exploiting Neural Query Translation into Cross Lingual Information Retrieval

no code implementations • 26 Oct 2020 • Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen

As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency.

Cross-Lingual Information Retrieval Data Augmentation +5

Paper
Add Code

Uncertainty-Aware Semantic Augmentation for Neural Machine Translation

no code implementations • EMNLP 2020 • Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo

As a sequence-to-sequence generation task, neural machine translation (NMT) naturally contains intrinsic uncertainty, where a single sentence in one language has multiple valid counterparts in the other.

Machine Translation NMT +3

Paper
Add Code

Iterative Domain-Repaired Back-Translation

no code implementations • EMNLP 2020 • Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo

In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.

Domain Adaptation NMT +1

Paper
Add Code

On Learning Universal Representations Across Languages

no code implementations • ICLR 2021 • Xiangpeng Wei, Rongxiang Weng, Yue Hu, Luxi Xing, Heng Yu, Weihua Luo

Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks.

Contrastive Learning Cross-Lingual Natural Language Inference +4

Paper
Add Code

Bilingual Dictionary Based Neural Machine Translation without Using Parallel Sentences

1 code implementation • ACL 2020 • Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen, Weihua Luo, Yue Zhang

In this paper, we propose a new task of machine translation (MT), which is based on no parallel sentences but can refer to a ground-truth bilingual dictionary.

Machine Translation Translation +1

Paper
Code

Language-aware Interlingua for Multilingual Neural Machine Translation

no code implementations • ACL 2020 • Changfeng Zhu, Heng Yu, Shanbo Cheng, Weihua Luo

However, the traditional multilingual model fails to capture the diversity and specificity of different languages, resulting in inferior performance compared with individual models that are sufficiently trained.

Machine Translation NMT +2

Paper
Add Code

Multiscale Collaborative Deep Models for Neural Machine Translation

1 code implementation • ACL 2020 • Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, Weihua Luo

Recent evidence reveals that Neural Machine Translation (NMT) models with deeper neural networks can be more effective but are difficult to train.

Machine Translation NMT +1

Paper
Code

AR: Auto-Repair the Synthetic Data for Neural Machine Translation

no code implementations • 5 Apr 2020 • Shanbo Cheng, Shaohui Kuang, Rongxiang Weng, Heng Yu, Changfeng Zhu, Weihua Luo

Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality.

Machine Translation NMT +2

Paper
Add Code

GRET: Global Representation Enhanced Transformer

no code implementations • 24 Feb 2020 • Rongxiang Weng, Hao-Ran Wei, Shu-Jian Huang, Heng Yu, Lidong Bing, Weihua Luo, Jia-Jun Chen

The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence.

Machine Translation Sentence +3

Paper
Add Code

Acquiring Knowledge from Pre-trained Model to Neural Machine Translation

no code implementations • 4 Dec 2019 • Rongxiang Weng, Heng Yu, Shu-Jian Huang, Shanbo Cheng, Weihua Luo

The standard paradigm of exploiting them includes two steps: first, pre-training a model, e. g. BERT, with a large scale unlabeled monolingual data.

General Knowledge Knowledge Distillation +3

Paper
Add Code

Cross-lingual Pre-training Based Transfer for Zero-shot Neural Machine Translation

no code implementations • 3 Dec 2019 • Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, Weihua Luo

However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side.

Machine Translation NMT +2

Paper
Add Code

Contrastive Attention Mechanism for Abstractive Sentence Summarization

1 code implementation • IJCNLP 2019 • Xiangyu Duan, Hoongfei Yu, Mingming Yin, Min Zhang, Weihua Luo, Yue Zhang

We propose a contrastive attention mechanism to extend the sequence-to-sequence framework for abstractive sentence summarization task, which aims to generate a brief summary of a given source sentence.

Abstractive Text Summarization Sentence +1

Paper
Code

Improving Neural Machine Translation with Pre-trained Representation

no code implementations • 21 Aug 2019 • Rongxiang Weng, Heng Yu, Shu-Jian Huang, Weihua Luo, Jia-Jun Chen

Then, we design a framework for integrating both source and target sentence-level representations into NMT model to improve the translation quality.

Machine Translation NMT +3

Paper
Add Code

Zero-Shot Cross-Lingual Abstractive Sentence Summarization through Teaching Generation and Attention

1 code implementation • ACL 2019 • Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, Weihua Luo

But there is no cross-lingual parallel corpus, whose source sentence language is different to the summary language, to directly train a cross-lingual ASSUM system.

Sentence Sentence Summarization +1

Paper
Code

Code-Switching for Enhancing NMT with Pre-Specified Translation

1 code implementation • NAACL 2019 • Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, Min Zhang

Leveraging user-provided translation to constrain NMT has practical significance.

Data Augmentation NMT +1

Paper
Code

Improved English to Russian Translation by Neural Suffix Prediction

no code implementations • 11 Jan 2018 • Kai Song, Yue Zhang, Min Zhang, Weihua Luo

Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages.

Machine Translation NMT +1

Paper
Add Code

Modeling Coherence for Neural Machine Translation with Dynamic and Topic Caches

no code implementations • COLING 2018 • Shaohui Kuang, Deyi Xiong, Weihua Luo, Guodong Zhou

Sentences in a well-formed text are connected to each other via various links to form the cohesive structure of the text.

Machine Translation NMT +2

Paper
Add Code

Attention Focusing for Neural Machine Translation by Bridging Source and Target Embeddings

no code implementations • ACL 2018 • Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, Deyi Xiong

In neural machine translation, a source sequence of words is encoded into a vector from which a target sequence is generated in the decoding phase.

Machine Translation Sentence +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.