no code implementations • WMT (EMNLP) 2021 • Jiayi Wang, Ke Wang, Boxing Chen, Yu Zhao, Weihua Luo, Yuqi Zhang
Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years.
no code implementations • WMT (EMNLP) 2021 • Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao
After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model with massive synthetic data pairs, and 3) fine-tuning the model with data denoising strategy.
no code implementations • EMNLP 2020 • Rongxiang Weng, Heng Yu, Xiangpeng Wei, Weihua Luo
Neural machine translation (NMT) has achieved great success due to the ability to generate high-quality sentences.
no code implementations • WMT (EMNLP) 2021 • Ke Wang, Shuqin Gu, Boxing Chen, Yu Zhao, Weihua Luo, Yuqi Zhang
We used tags to mark and add the term translations into the matched sentences.
no code implementations • 28 Apr 2022 • Yu Wan, Dayiheng Liu, Baosong Yang, Tianchi Bi, Haibo Zhang, Boxing Chen, Weihua Luo, Derek F. Wong, Lidia S. Chao
After investigating the recent advances of trainable metrics, we conclude several aspects of vital importance to obtain a well-performed metric model by: 1) jointly leveraging the advantages of source-included model and reference-only model, 2) continuously pre-training the model with massive synthetic data pairs, and 3) fine-tuning the model with data denoising strategy.
2 code implementations • ACL 2022 • Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Weihua Luo, Jun Xie, Rong Jin
Although data augmentation is widely used to enrich the training data, conventional methods with discrete manipulations fail to generate diverse and faithful training samples.
no code implementations • 30 Dec 2021 • Jiayi Wang, Ke Wang, Boxing Chen, Yu Zhao, Weihua Luo, Yuqi Zhang
Quality Estimation, as a crucial step of quality control for machine translation, has been explored for years.
no code implementations • 15 Dec 2021 • Xin Liu, Dayiheng Liu, Baosong Yang, Haibo Zhang, Junwei Ding, Wenqing Yao, Weihua Luo, Haiying Zhang, Jinsong Su
Generative commonsense reasoning requires machines to generate sentences describing an everyday scenario given several concepts, which has attracted much attention recently.
1 code implementation • Findings (EMNLP) 2021 • Xin Zheng, Zhirui Zhang, ShuJian Huang, Boxing Chen, Jun Xie, Weihua Luo, Jiajun Chen
Recently, $k$NN-MT has shown the promising capability of directly incorporating the pre-trained neural machine translation (NMT) model with domain-specific token-level $k$-nearest-neighbor ($k$NN) retrieval to achieve domain adaptation without retraining.
1 code implementation • Findings (EMNLP) 2021 • Weizhi Wang, Zhirui Zhang, Yichao Du, Boxing Chen, Jun Xie, Weihua Luo
However, it usually suffers from capturing spurious correlations between the output language and language invariant semantics due to the maximum likelihood training objective, leading to poor transfer performance on zero-shot translation.
1 code implementation • 31 Aug 2021 • Weizhi Wang, Zhirui Zhang, Junliang Guo, Yinpei Dai, Boxing Chen, Weihua Luo
In this paper, we propose to formulate the task-oriented dialogue system as the purely natural language generation task, so as to fully leverage the large-scale pre-trained models like GPT-2 and simplify complicated delexicalization prepossessing.
no code implementations • ACL 2021 • Linqing Chen, Junhui Li, ZhengXian Gong, Boxing Chen, Weihua Luo, Min Zhang, Guodong Zhou
To this end, we propose two pre-training tasks.
1 code implementation • ACL 2021 • Huan Lin, Liang Yao, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Degen Huang, Jinsong Su
Furthermore, we contribute the first Chinese-English parallel corpus annotated with user behavior called UDT-Corpus.
no code implementations • ACL 2021 • Xin Liu, Baosong Yang, Dayiheng Liu, Haibo Zhang, Weihua Luo, Min Zhang, Haiying Zhang, Jinsong Su
A well-known limitation in pretrain-finetune paradigm lies in its inflexibility caused by the one-size-fits-all vocabulary.
1 code implementation • Findings (ACL) 2021 • Jinpeng Zhang, Baijun Ji, Nini Xiao, Xiangyu Duan, Min Zhang, Yangbin Shi, Weihua Luo
Bilingual Lexicon Induction (BLI) aims to map words in one language to their translations in another, and is typically through learning linear projections to align monolingual word representation spaces.
no code implementations • NAACL 2021 • Pengcheng Yang, Pei Zhang, Boxing Chen, Jun Xie, Weihua Luo
Document machine translation aims to translate the source sentence into the target language in the presence of additional contextual information.
1 code implementation • ACL 2021 • Guangsheng Bao, Yue Zhang, Zhiyang Teng, Boxing Chen, Weihua Luo
However, study shows that when we further enlarge the translation unit to a whole document, supervised training of Transformer can fail.
2 code implementations • ACL 2021 • Xin Zheng, Zhirui Zhang, Junliang Guo, ShuJian Huang, Boxing Chen, Weihua Luo, Jiajun Chen
On four benchmark machine translation datasets, we demonstrate that the proposed method is able to effectively filter out the noises in retrieval results and significantly outperforms the vanilla kNN-MT model.
no code implementations • 16 Apr 2021 • Junliang Guo, Zhirui Zhang, Linlin Zhang, Linli Xu, Boxing Chen, Enhong Chen, Weihua Luo
In this way, our approach is able to more comprehensively find adversarial examples around the decision boundary and effectively conduct adversarial attacks.
no code implementations • 1 Jan 2021 • Shaohui Kuang, Heng Yu, Weihua Luo, Qiang Wang
Existing ways either employ extra encoder to encode information from TM or concatenate source sentence and TM sentences as encoder's input.
no code implementations • COLING 2020 • Liang Yao, Baosong Yang, Haibo Zhang, Boxing Chen, Weihua Luo
Query translation (QT) serves as a critical factor in successful cross-lingual information retrieval (CLIR).
no code implementations • Findings of the Association for Computational Linguistics 2020 • Yongchao Deng, Hongfei Yu, Heng Yu, Xiangyu Duan, Weihua Luo
Multi-Domain Neural Machine Translation (NMT) aims at building a single system that performs well on a range of target domains.
no code implementations • 26 Oct 2020 • Tianchi Bi, Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
Query translation (QT) is a key component in cross-lingual information retrieval system (CLIR).
no code implementations • 26 Oct 2020 • Liang Yao, Baosong Yang, Haibo Zhang, Weihua Luo, Boxing Chen
As a crucial role in cross-language information retrieval (CLIR), query translation has three main challenges: 1) the adequacy of translation; 2) the lack of in-domain parallel training data; and 3) the requisite of low latency.
no code implementations • EMNLP 2020 • Xiangpeng Wei, Heng Yu, Yue Hu, Rongxiang Weng, Luxi Xing, Weihua Luo
As a sequence-to-sequence generation task, neural machine translation (NMT) naturally contains intrinsic uncertainty, where a single sentence in one language has multiple valid counterparts in the other.
no code implementations • EMNLP 2020 • Hao-Ran Wei, Zhirui Zhang, Boxing Chen, Weihua Luo
In this paper, we focus on the domain-specific translation with low resources, where in-domain parallel corpora are scarce or nonexistent.
no code implementations • ICLR 2021 • Xiangpeng Wei, Rongxiang Weng, Yue Hu, Luxi Xing, Heng Yu, Weihua Luo
Recent studies have demonstrated the overwhelming advantage of cross-lingual pre-trained models (PTMs), such as multilingual BERT and XLM, on cross-lingual NLP tasks.
Contrastive Learning
Cross-Lingual Natural Language Inference
+3
1 code implementation • ACL 2020 • Xiangyu Duan, Baijun Ji, Hao Jia, Min Tan, Min Zhang, Boxing Chen, Weihua Luo, Yue Zhang
In this paper, we propose a new task of machine translation (MT), which is based on no parallel sentences but can refer to a ground-truth bilingual dictionary.
no code implementations • ACL 2020 • Changfeng Zhu, Heng Yu, Shanbo Cheng, Weihua Luo
However, the traditional multilingual model fails to capture the diversity and specificity of different languages, resulting in inferior performance compared with individual models that are sufficiently trained.
1 code implementation • ACL 2020 • Xiangpeng Wei, Heng Yu, Yue Hu, Yue Zhang, Rongxiang Weng, Weihua Luo
Recent evidence reveals that Neural Machine Translation (NMT) models with deeper neural networks can be more effective but are difficult to train.
no code implementations • 5 Apr 2020 • Shanbo Cheng, Shaohui Kuang, Rongxiang Weng, Heng Yu, Changfeng Zhu, Weihua Luo
Compared with only using limited authentic parallel data as training corpus, many studies have proved that incorporating synthetic parallel data, which generated by back translation (BT) or forward translation (FT, or selftraining), into the NMT training process can significantly improve translation quality.
no code implementations • 24 Feb 2020 • Rongxiang Weng, Hao-Ran Wei, Shu-Jian Huang, Heng Yu, Lidong Bing, Weihua Luo, Jia-Jun Chen
The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence.
no code implementations • 4 Dec 2019 • Rongxiang Weng, Heng Yu, Shu-Jian Huang, Shanbo Cheng, Weihua Luo
The standard paradigm of exploiting them includes two steps: first, pre-training a model, e. g. BERT, with a large scale unlabeled monolingual data.
no code implementations • 3 Dec 2019 • Baijun Ji, Zhirui Zhang, Xiangyu Duan, Min Zhang, Boxing Chen, Weihua Luo
However, existing transfer methods involving a common target language are far from success in the extreme scenario of zero-shot translation, due to the language space mismatch problem between transferor (the parent model) and transferee (the child model) on the source side.
1 code implementation • IJCNLP 2019 • Xiangyu Duan, Hoongfei Yu, Mingming Yin, Min Zhang, Weihua Luo, Yue Zhang
We propose a contrastive attention mechanism to extend the sequence-to-sequence framework for abstractive sentence summarization task, which aims to generate a brief summary of a given source sentence.
no code implementations • 21 Aug 2019 • Rongxiang Weng, Heng Yu, Shu-Jian Huang, Weihua Luo, Jia-Jun Chen
Then, we design a framework for integrating both source and target sentence-level representations into NMT model to improve the translation quality.
1 code implementation • ACL 2019 • Xiangyu Duan, Mingming Yin, Min Zhang, Boxing Chen, Weihua Luo
But there is no cross-lingual parallel corpus, whose source sentence language is different to the summary language, to directly train a cross-lingual ASSUM system.
1 code implementation • NAACL 2019 • Kai Song, Yue Zhang, Heng Yu, Weihua Luo, Kun Wang, Min Zhang
Leveraging user-provided translation to constrain NMT has practical significance.
no code implementations • 11 Jan 2018 • Kai Song, Yue Zhang, Min Zhang, Weihua Luo
Neural machine translation (NMT) suffers a performance deficiency when a limited vocabulary fails to cover the source or target side adequately, which happens frequently when dealing with morphologically rich languages.
no code implementations • COLING 2018 • Shaohui Kuang, Deyi Xiong, Weihua Luo, Guodong Zhou
Sentences in a well-formed text are connected to each other via various links to form the cohesive structure of the text.
no code implementations • ACL 2018 • Shaohui Kuang, Junhui Li, António Branco, Weihua Luo, Deyi Xiong
In neural machine translation, a source sequence of words is encoded into a vector from which a target sequence is generated in the decoding phase.