1 code implementation • ACL 2022 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+5
1 code implementation • Findings (EMNLP) 2021 • Weijia Xu, Yuwei Yin, Shuming Ma, Dongdong Zhang, Haoyang Huang
Multilingual neural machine translation models typically handle one source language at a time.
1 code implementation • 20 Sep 2023 • Tengchao Lv, Yupan Huang, Jingye Chen, Lei Cui, Shuming Ma, Yaoyao Chang, Shaohan Huang, Wenhui Wang, Li Dong, Weiyao Luo, Shaoxiang Wu, Guoxin Wang, Cha Zhang, Furu Wei
We present Kosmos-2. 5, a multimodal literate model for machine reading of text-intensive images.
5 code implementations • 17 Jul 2023 • Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei
In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance.
2 code implementations • 5 Jul 2023 • Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei
Scaling sequence length has become a critical demand in the era of large language models.
1 code implementation • 26 Jun 2023 • Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.
2 code implementations • 18 May 2023 • Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang
We conduct experiments on a multilingual machine translation benchmark in 11 languages.
1 code implementation • 18 May 2023 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell
Several recent papers claim human parity at sentence-level Machine Translation (MT), especially in high-resource languages.
1 code implementation • 6 Apr 2023 • Liang Chen, Shuming Ma, Dongdong Zhang, Furu Wei, Baobao Chang
Based on our observations, we propose the Double Power Law to predict the unique performance trade-off front in MNMT, which is robust across various languages, data adequacy, and the number of tasks.
1 code implementation • 1 Mar 2023 • Haiteng Zhao, Shuming Ma, Dongdong Zhang, Zhi-Hong Deng, Furu Wei
Despite that going deep has proven successful in many neural architectures, the existing graph transformers are relatively shallow.
1 code implementation • 27 Feb 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
Ranked #2 on
Image Captioning
on Flickr30k Captions test
(CIDEr metric)
no code implementations • 17 Jan 2023 • Jian Yang, Yuwei Yin, Shuming Ma, Liqun Yang, Hongcheng Guo, Haoyang Huang, Dongdong Zhang, Yutao Zeng, Zhoujun Li, Furu Wei
Context-aware neural machine translation aims to use the document-level context to improve translation quality.
1 code implementation • 20 Dec 2022 • Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, Furu Wei
We comprehensively compare the behaviors of in-context learning and explicit finetuning on real tasks to provide empirical evidence that supports our understanding.
1 code implementation • 20 Dec 2022 • Jian Yang, Shuming Ma, Li Dong, Shaohan Huang, Haoyang Huang, Yuwei Yin, Dongdong Zhang, Liqun Yang, Furu Wei, Zhoujun Li
Inspired by the idea of Generative Adversarial Networks (GANs), we propose a GAN-style model for encoder-decoder pre-training by introducing an auxiliary discriminator, unifying the ability of language understanding and generation in a single model.
4 code implementations • 20 Dec 2022 • Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, Furu Wei
Position modeling plays a critical role in Transformers.
no code implementations • 15 Dec 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Wai Lam, Furu Wei
Despite the success of multilingual sequence-to-sequence pre-training, most existing approaches rely on document-level monolingual corpora in many different languages, sentence-level bilingual corpora,\footnote{In this paper, we use `bilingual corpora' to denote parallel corpora with `bilingual translation pairs' in many different language pairs, each consisting of two sentences/documents with the same meaning written in different languages.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+3
1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
Large Transformers have achieved state-of-the-art performance across many tasks.
no code implementations • 26 Oct 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Mrinmaya Sachan, Ryan Cotterell
The BWB corpus consists of Chinese novels translated by experts into English, and the annotated test set is designed to probe the ability of machine translation systems to model various discourse phenomena.
1 code implementation • 13 Oct 2022 • Jian Yang, Shaohan Huang, Shuming Ma, Yuwei Yin, Li Dong, Dongdong Zhang, Hongcheng Guo, Zhoujun Li, Furu Wei
Specifically, the target sequence is first translated into the source language and then tagged by a source NER model.
4 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
A big convergence of model architectures across language, vision, speech, and multimodal is emerging.
no code implementations • 28 Sep 2022 • Hongyuan Lu, Haoyang Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Wai Lam
Despite the fact that multilingual agreement (MA) has shown its importance for multilingual neural machine translation (MNMT), current methodologies in the field have two shortages: (i) require parallel data between multiple language pairs, which is not always realistic and (ii) optimize the agreement in an ambiguous direction, which hampers the translation performance.
1 code implementation • 29 Jul 2022 • Jian Yang, Yuwei Yin, Liqun Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Furu Wei, Zhoujun Li
Transformer structure, stacked by a sequence of encoder and decoder network layers, achieves significant development in neural machine translation.
1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Zhoujun Li, Furu Wei
Nonetheless, multilingual training is plagued by language interference degeneration in shared parameters because of the negative interference among different translation directions, especially on high-resource languages.
1 code implementation • 11 Jul 2022 • Jian Yang, Yuwei Yin, Shuming Ma, Dongdong Zhang, Shuangzhi Wu, Hongcheng Guo, Zhoujun Li, Furu Wei
Most translation tasks among languages belong to the zero-resource translation problem where parallel corpora are unavailable.
1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Ranked #2 on
Image Captioning
on nocaps val
2 code implementations • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
We also present a comprehensive analysis on the representation and routing behaviors of our models.
1 code implementation • ACL 2022 • Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei
We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.
6 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.
1 code implementation • 23 Feb 2022 • Lianzhe Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Houfeng Wang
To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.
no code implementations • 26 Jan 2022 • Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.
no code implementations • COLING 2022 • Juncheng Wan, Jian Yang, Shuming Ma, Dongdong Zhang, Weinan Zhang, Yong Yu, Zhoujun Li
While end-to-end neural machine translation (NMT) has achieved impressive progress, noisy input usually leads models to become fragile and unstable.
no code implementations • 5 Jan 2022 • Xu Zhang, Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Jinlong Li, Furu Wei
Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation.
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
1 code implementation • 16 Oct 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+5
no code implementations • ACL 2021 • Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei
Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives.
3 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.
Ranked #1 on
Zero-Shot Cross-Lingual Transfer
on XTREME
2 code implementations • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
no code implementations • NAACL 2021 • Jian Yang, Shuming Ma, Dongdong Zhang, Juncheng Wan, Zhoujun Li, Ming Zhou
Most current neural machine translation models adopt a monotonic decoding order of either left-to-right or right-to-left.
no code implementations • Findings (ACL) 2021 • Weijia Xu, Shuming Ma, Dongdong Zhang, Marine Carpuat
While non-autoregressive (NAR) models are showing great promise for machine translation, their use is limited by their dependence on knowledge distillation from autoregressive models.
1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.
1 code implementation • EMNLP 2021 • Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei
Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.
2 code implementations • NAACL 2022 • Yuchen Eleanor Jiang, Tianyu Liu, Shuming Ma, Dongdong Zhang, Jian Yang, Haoyang Huang, Rico Sennrich, Ryan Cotterell, Mrinmaya Sachan, Ming Zhou
Standard automatic metrics, e. g. BLEU, are not reliable for document-level MT evaluation.
no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei
Multilingual machine translation enables a single model to translate between different languages.
no code implementations • ACL 2020 • Jian Yang, Shuming Ma, Dong-dong Zhang, Zhoujun Li, Ming Zhou
Although neural machine translation (NMT) has achieved significant progress in recent years, most previous NMT models only depend on the source text to generate translation.
no code implementations • ACL 2020 • Shuming Ma, Dong-dong Zhang, Ming Zhou
Most of the existing models for document-level machine translation adopt dual-encoder structures.
no code implementations • 7 Feb 2020 • Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao
In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities.
no code implementations • WS 2019 • Deli Chen, Shuming Ma, Keiko Harimoto, Ruihan Bao, Qi Su, Xu sun
In this work, we propose a BERT-based Hierarchical Aggregation Model to summarize a large amount of finance news to predict forex movement.
no code implementations • 18 Sep 2019 • Wei Li, Shuheng Li, Shuming Ma, Yancheng He, Deli Chen, Xu sun
Graph is a natural structure to describe the complicated relation between tokens.
1 code implementation • ACL 2019 • Shuming Ma, Pengcheng Yang, Tianyu Liu, Peng Li, Jie zhou, Xu sun
We propose a novel model to separate the generation into two stages: key fact prediction and surface realization.
1 code implementation • ACL 2019 • Pengcheng Yang, Fuli Luo, Shuming Ma, Junyang Lin, Xu sun
In this way, we can reduce the dependence of the model on the label order, as well as capture high-order correlations between labels.
no code implementations • EMNLP 2018 • Wei Wu, Houfeng Wang, Tianyu Liu, Shuming Ma
As a result, the memory consumption can be reduced because the self-attention is performed at the phrase level instead of the sentence level.
no code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Furu Wei, Xu sun
To fully exploit the unpaired data, we completely remove the need for parallel data and propose a novel unsupervised approach to train an automatic article commenting model, relying on nothing but unpaired articles and comments.
3 code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Damai Dai, Furu Wei, Xu sun
We introduce the task of automatic live commenting.
no code implementations • 10 Sep 2018 • Pengcheng Yang, Shuming Ma, Yi Zhang, Junyang Lin, Qi Su, Xu sun
However, the Seq2Seq model is not suitable for the MLTC task in essence.
1 code implementation • EMNLP 2018 • Junyang Lin, Qi Su, Pengcheng Yang, Shuming Ma, Xu sun
We propose a novel model for multi-label text classification, which is based on sequence-to-sequence learning.
no code implementations • 22 Aug 2018 • Deli Chen, Shuming Ma, Pengcheng Yang, Xu sun
In this work, we introduce a novel task: high-quality comment identification (HQCI), which aims to automatically assess the quality of online comments.
no code implementations • COLING 2018 • Hao Wang, Xiaodong Zhang, Shuming Ma, Xu sun, Houfeng Wang, Mengxiang Wang
Then the system measures the relevance between each question and candidate table cells, and choose the most related cell as the source of answer.
1 code implementation • COLING 2018 • Pengcheng Yang, Xu sun, Wei Li, Shuming Ma, Wei Wu, Houfeng Wang
Further analysis of experimental results demonstrates that the proposed methods not only capture the correlations between labels, but also select the most informative words automatically when predicting different labels.
1 code implementation • COLING 2018 • Junyang Lin, Xu sun, Xuancheng Ren, Shuming Ma, Jinsong Su, Qi Su
A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order.
Ranked #9 on
Machine Translation
on IWSLT2015 English-Vietnamese
1 code implementation • ACL 2018 • Shuming Ma, Xu sun, Yizhong Wang, Junyang Lin
However, most of the existing neural machine translation models only use one of the correct translations as the targets, and the other correct sentences are punished as the incorrect sentences in the training stage.
1 code implementation • ACL 2018 • Shuming Ma, Xu sun, Junyang Lin, Houfeng Wang
In this work, we supervise the learning of the representation of the source content with that of the summary.
1 code implementation • ACL 2018 • Pengcheng Yang, Xu sun, Wei Li, Shuming Ma
As more and more academic papers are being submitted to conferences and journals, evaluating all these papers by professionals is time-consuming and can cause inequality due to the personal factors of the reviewers.
4 code implementations • ACL 2018 • Junyang Lin, Xu sun, Shuming Ma, Qi Su
To tackle the problem, we propose a global encoding framework, which controls the information flow from the encoder to the decoder based on the global information of the source context.
Ranked #28 on
Text Summarization
on GigaWord
no code implementations • 3 May 2018 • Shuming Ma, Xu sun, Junyang Lin, Xuancheng Ren
Text summarization and sentiment classification both aim to capture the main ideas of the text but at different levels.
1 code implementation • NAACL 2018 • Shuming Ma, Xu sun, Wei Li, Sujian Li, Wenjie Li, Xuancheng Ren
The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words.
no code implementations • 6 Feb 2018 • Junyang Lin, Shuming Ma, Qi Su, Xu sun
ACA learns to control the attention by keeping track of the decoding history and the current information with a memory vector, so that the model can take the translated contents and the current information into consideration.
no code implementations • 25 Nov 2017 • Xu Sun, Weiwei Sun, Shuming Ma, Xuancheng Ren, Yi Zhang, Wenjie Li, Houfeng Wang
The decoding of the complex structure model is regularized by the additionally trained simple structure model.
1 code implementation • COLING 2018 • Yi Zhang, Xu sun, Shuming Ma, Yang Yang, Xuancheng Ren
In our work, we first design a new model called "high order LSTM" to predict multiple tags for the current token which contains not only the current tag but also the previous several tags.
3 code implementations • 17 Nov 2017 • Xu Sun, Xuancheng Ren, Shuming Ma, Bingzhen Wei, Wei Li, Jingjing Xu, Houfeng Wang, Yi Zhang
Based on the sparsified gradients, we further simplify the model by eliminating the rows or columns that are seldom updated, which will reduce the computational cost both in the training and decoding, and potentially accelerate decoding in real-world applications.
1 code implementation • ICLR 2018 • Xu Sun, Bingzhen Wei, Xuancheng Ren, Shuming Ma
We propose a method, called Label Embedding Network, which can learn label representation (label embedding) during the training process of deep networks.
1 code implementation • 6 Oct 2017 • Shuming Ma, Xu sun
In this work, our goal is to improve semantic relevance between source texts and simplified texts for text summarization and text simplification.
2 code implementations • ICML 2017 • Xu Sun, Xuancheng Ren, Shuming Ma, Houfeng Wang
In back propagation, only a small subset of the full gradient is computed to update the model parameters.
1 code implementation • ACL 2017 • Shuming Ma, Xu sun, Jingjing Xu, Houfeng Wang, Wenjie Li, Qi Su
In this work, our goal is to improve semantic relevance between source texts and summaries for Chinese social media summarization.
no code implementations • 2 Mar 2017 • Shuming Ma, Xu sun
To speed up the training process, many existing systems use parallel technology for online learning algorithms.
no code implementations • 2 Mar 2017 • Xu Sun, Shuming Ma
To deal with this problem, we propose a parallel algorithm called parallel perceptron.
no code implementations • 14 Nov 2016 • Shuming Ma, Xu sun
Conditional Random Field (CRF) and recurrent neural models have achieved success in structured prediction.
4 code implementations • 29 Mar 2015 • Xu Sun, Shuming Ma, Yi Zhang, Xuancheng Ren
We show that this method with fast training and theoretical guarantee of convergence, which is easy to implement, can support search-based optimization and obtain top accuracy.