1 code implementation • ACL 2022 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+5
no code implementations • Findings (EMNLP) 2021 • Weijia Xu, Yuwei Yin, Shuming Ma, Dongdong Zhang, Haoyang Huang
Multilingual neural machine translation models typically handle one source language at a time.
1 code implementation • 24 May 2023 • Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Furu Wei
Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements.
no code implementations • 23 May 2023 • Lan Jiang, Haoyang Huang, Dongdong Zhang, Rui Jiang, Furu Wei
Notably, the analysis demonstrates that our method significantly influences the initial training process, leading to more efficient convergence and superior solutions.
no code implementations • 23 May 2023 • Minwoo Lee, Hyukhun Koh, Kang-il Lee, Dongdong Zhang, Minsung Kim, Kyomin Jung
Our method is target-language-agnostic and applicable to already trained multilingual machine translation models through post-fine-tuning.
1 code implementation • 18 May 2023 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell
Several recent papers claim human parity at sentence-level Machine Translation (MT), especially in high-resource languages.
2 code implementations • 18 May 2023 • Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
no code implementations • 11 May 2023 • Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
no code implementations • 11 May 2023 • Hongyuan Lu, Haoyang Huang, Dongdong Zhang, Haoran Yang, Wai Lam, Furu Wei
Large language models (LLMs) have shown surprisingly good performance in multilingual neural machine translation (MNMT) even when trained without parallel data.
1 code implementation • 6 Apr 2023 • Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang
Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks.
1 code implementation • 1 Mar 2023 • Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei
Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow.
no code implementations • 17 Jan 2023 • Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li
Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.
no code implementations • 15 Dec 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei
Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+3
no code implementations • 26 Oct 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
no code implementations • 19 Oct 2022 • Hongcheng Guo, Jiaheng Liu, Haoyang Huang, Jian Yang, Zhoujun Li, Dongdong Zhang, Zheng Cui, Furu Wei
To this end, we first propose the Multilingual MMT task by establishing two new Multilingual MMT benchmark datasets covering seven languages.
1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei
Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.
no code implementations • 28 Sep 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Wai Lam
Despite the fact that multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT), current methodologies in the field have two shortages: (i) require parallel data between multiple language pairs, which is not always realistic and (ii) optimize the agreement in an ambiguous direction, which hampers the translation performance.
1 code implementation • 29 Jul 2022 • Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li
Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.
1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Zhoujun Li, Furu Wei
Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages.
1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Shuangzhi Wu, Hongcheng Guo, Zhoujun Li, Furu Wei
Most translation tasks among languages belong to the zero-resource translation problem where parallel corpora are unavailable.
6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.
1 code implementation • 23 Feb 2022 • Lianzhe Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Houfeng Wang
To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.
no code implementations • COLING 2022 • Juncheng Wan, Jian Yang, Shuming Ma, Dongdong Zhang, Weinan Zhang, Yong Yu, Zhoujun Li
While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable.
no code implementations • 5 Jan 2022 • Xu Zhang, Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Jinlong Li, Furu Wei
Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation.
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
1 code implementation • 16 Oct 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+5
no code implementations • ACL 2021 • Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei
Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives.
1 code implementation • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
no code implementations • NAACL 2021 • Jian Yang, Shuming Ma, Dongdong Zhang, Juncheng Wan, Zhoujun Li, Ming Zhou
Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left.
no code implementations • Findings (ACL) 2021 • Weijia Xu, Shuming Ma, Dongdong Zhang, Marine Carpuat
While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models.
1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.
1 code implementation • NAACL 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou
Standard automatic metrics, e. g. BLEU, are not reliable for document-level MT evaluation.
no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei
Multilingual machine translation enables a single model to translate between different languages.
1 code implementation • 20 Oct 2020 • Dongdong Zhang, Xiaohui Yuan, Ping Zhang
Deep learning methods have demonstrated promising results in predictive healthcare tasks.
1 code implementation • CVPR 2021 • Minheng Ni, Haoyang Huang, Lin Su, Edward Cui, Taroon Bharti, Lijuan Wang, Jianfeng Gao, Dongdong Zhang, Nan Duan
We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines multilingual pre-training and multimodal pre-training into a unified framework via multitask pre-training.