no code implementations • EACL (AdaptNLP) 2021 • Tianyu Chen, Shaohan Huang, Furu Wei, JianXin Li
In unsupervised domain adaptation, we aim to train a model that works well on a target domain when provided with labeled source samples and unlabeled target samples.
no code implementations • ACL 2022 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
1 code implementation • ACL 2022 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+3
no code implementations • Findings (ACL) 2022 • Tianyu Chen, Hangbo Bao, Shaohan Huang, Li Dong, Binxing Jiao, Daxin Jiang, Haoyi Zhou, JianXin Li, Furu Wei
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e. g., search history, medical record, bank account).
no code implementations • Findings (ACL) 2022 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
Multimodal pre-training with text, layout, and image has achieved SOTA performance for visually rich document understanding tasks recently, which demonstrates the great potential for joint learning across different modalities.
1 code implementation • ICML 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
1 code implementation • 20 May 2022 • Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending on both text context and visual knowledge in images.
no code implementations • 20 May 2022 • Zhixiong Han, Yaru Hao, Li Dong, Furu Wei
In-context learning of GPT-like models has been recognized as fragile across different hand-crafted templates, and demonstration permutations.
1 code implementation • 20 May 2022 • Tao Ge, Heming Xia, Xin Sun, Si-Qing Chen, Furu Wei
We study lossless acceleration for seq2seq generation with a novel decoding algorithm -- Aggressive Decoding.
Abstractive Text Summarization
Grammatical Error Correction
+3
no code implementations • ACL 2022 • Ruipeng Jia, Xingxing Zhang, Yanan Cao, Shi Wang, Zheng Lin, Furu Wei
In zero-shot multilingual extractive text summarization, a model is typically trained on English summarization dataset and then applied on summarization datasets of other languages.
no code implementations • 27 Apr 2022 • Sanyuan Chen, Yu Wu, Chengyi Wang, Shujie Liu, Zhuo Chen, Peidong Wang, Gang Liu, Jinyu Li, Jian Wu, Xiangzhan Yu, Furu Wei
Recently, self-supervised learning (SSL) has demonstrated strong performance in speaker recognition, even if the pre-training objective is designed for speech recognition.
1 code implementation • 20 Apr 2022 • Zewen Chi, Li Dong, Shaohan Huang, Damai Dai, Shuming Ma, Barun Patra, Saksham Singhal, Payal Bajaj, Xia Song, Furu Wei
We also present a comprehensive analysis on the representation and routing behaviors of our models.
1 code implementation • ACL 2022 • Damai Dai, Li Dong, Shuming Ma, Bo Zheng, Zhifang Sui, Baobao Chang, Furu Wei
We point out that existing learning-to-route MoE methods suffer from the routing fluctuation issue, i. e., the target expert of the same input may change along with training, but only one expert will be activated for the input during inference.
1 code implementation • 18 Apr 2022 • Yupan Huang, Tengchao Lv, Lei Cui, Yutong Lu, Furu Wei
In this paper, we propose LayoutLMv3 to pre-train multimodal Transformers for Document AI with unified text and image masking.
Ranked #1 on
Key information extraction
on CORD
1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei
In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.
2 code implementations • 30 Mar 2022 • Heming Xia, Tao Ge, Furu Wei, Zhifang Sui
Different from previous work accelerating translation at the cost of quality loss, we propose Generalized Aggressive Decoding (GAD) -- a novel decoding paradigm for lossless speedup of autoregressive translation, through the collaboration of autoregressive and non-autoregressive translation (NAT) of the Transformer.
no code implementations • ACL 2022 • Haoyu Song, Li Dong, Wei-Nan Zhang, Ting Liu, Furu Wei
We first evaluate CLIP's zero-shot performance on a typical visual question answering task and demonstrate a zero-shot cross-modality transfer capability of CLIP on the visual entailment task.
3 code implementations • 4 Mar 2022 • Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei
Image Transformer has recently achieved significant progress for natural image understanding, either using supervised (ViT, DeiT, etc.)
Ranked #2 on
Document Layout Analysis
on PubLayNet val
4 code implementations • 1 Mar 2022 • Hongyu Wang, Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei
In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers.
no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen
We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.
no code implementations • 23 Feb 2022 • Lianzhe Huang, Shuming Ma, Dongdong Zhang, Furu Wei, Houfeng Wang
To collocate with the unified prompt, we propose a new initialization method for the target label word to further improve the model's transferability across languages.
no code implementations • 17 Feb 2022 • Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao
With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.
1 code implementation • 16 Feb 2022 • Tao Ge, Furu Wei
We propose EdgeFormer -- a parameter-efficient Transformer of the encoder-decoder architecture for on-device seq2seq generation, which is customized under strict computation and memory constraints.
no code implementations • 7 Feb 2022 • Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei
CIM is a general and flexible visual pre-training framework that is suitable for various network architectures.
no code implementations • 26 Jan 2022 • Xin Sun, Tao Ge, Shuming Ma, Jingjing Li, Furu Wei, Houfeng Wang
Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns.
1 code implementation • 15 Jan 2022 • Yunzhi Yao, Shaohan Huang, Ningyu Zhang, Li Dong, Furu Wei, Huajun Chen
Knowledge-Enhanced Model have developed a diverse set of techniques for knowledge integration on different knowledge sources.
1 code implementation • 12 Jan 2022 • Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
To this end, we propose a prompt based sentence embeddings method which can reduce token embeddings biases and make the original BERT layers more effective.
no code implementations • 6 Jan 2022 • Juncheng Wan, Jian Yang, Shuming Ma, Dongdong Zhang, Weinan Zhang, Yong Yu, Furu Wei
In this paper, we propose a phrase-level adversarial example generation (PAEG) method to enhance the robustness of the model.
no code implementations • 5 Jan 2022 • Xu Zhang, Jian Yang, Haoyang Huang, Shuming Ma, Dongdong Zhang, Jinlong Li, Furu Wei
Existing document-level neural machine translation (NMT) models have sufficiently explored different context settings to provide guidance for target generation.
no code implementations • 16 Dec 2021 • Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei
We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.
6 code implementations • 18 Nov 2021 • Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo
Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.
Ranked #1 on
Instance Segmentation
on COCO minival
no code implementations • 16 Nov 2021 • Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.
no code implementations • 3 Nov 2021 • Wenhui Wang, Hangbo Bao, Li Dong, Furu Wei
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.
Ranked #1 on
Visual Question Answering
on VQA v2 test-dev
no code implementations • WMT (EMNLP) 2021 • Jian Yang, Shuming Ma, Haoyang Huang, Dongdong Zhang, Li Dong, Shaohan Huang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
This report describes Microsoft's machine translation systems for the WMT21 shared task on large-scale multilingual machine translation.
no code implementations • 27 Oct 2021 • Wangyou Zhang, Zhuo Chen, Naoyuki Kanda, Shujie Liu, Jinyu Li, Sefik Emre Eskimez, Takuya Yoshioka, Xiong Xiao, Zhong Meng, Yanmin Qian, Furu Wei
Multi-talker conversational speech processing has drawn many interests for various applications such as meeting transcription.
3 code implementations • 26 Oct 2021 • Sanyuan Chen, Chengyi Wang, Zhengyang Chen, Yu Wu, Shujie Liu, Zhuo Chen, Jinyu Li, Naoyuki Kanda, Takuya Yoshioka, Xiong Xiao, Long Zhou, Shuo Ren, Yanmin Qian, Yao Qian, Jian Wu, Michael Zeng, Xiangzhan Yu, Furu Wei
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks.
1 code implementation • 26 Oct 2021 • Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei
Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.
1 code implementation • 21 Oct 2021 • Ting Jiang, Shaohan Huang, Zihan Zhang, Deqing Wang, Fuzhen Zhuang, Furu Wei, Haizhen Huang, Liangjie Zhang, Qi Zhang
While pre-trained language models have achieved great success on various natural language understanding tasks, how to effectively leverage them into non-autoregressive generation tasks remains a challenge.
1 code implementation • 16 Oct 2021 • Junlong Li, Yiheng Xu, Lei Cui, Furu Wei
Multimodal pre-training with text, layout, and image has made significant progress for Visually Rich Document Understanding (VRDU), especially the fixed-layout documents such as scanned document images.
1 code implementation • 16 Oct 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
When applied to zero-shot cross-lingual abstractive summarization, it produces an average performance gain of 12. 3 ROUGE-L over mBART-ft. We conduct detailed analyses to understand the key ingredients of SixT+, including multilinguality of the auxiliary parallel data, positional disentangled encoder, and the cross-lingual transferability of its encoder.
Abstractive Text Summarization
Cross-Lingual Abstractive Summarization
+3
1 code implementation • ACL 2022 • Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.
2 code implementations • 12 Oct 2021 • Sanyuan Chen, Yu Wu, Chengyi Wang, Zhengyang Chen, Zhuo Chen, Shujie Liu, Jian Wu, Yao Qian, Furu Wei, Jinyu Li, Xiangzhan Yu
We integrate the proposed methods into the HuBERT framework.
no code implementations • EMNLP 2021 • Jiaqi Bai, Long Zhou, Ambrosio Blanco, Shujie Liu, Furu Wei, Ming Zhou, Zhoujun Li
We propose a novel task of jointly repairing program codes and generating commit messages.
2 code implementations • 21 Sep 2021 • Minghao Li, Tengchao Lv, Lei Cui, Yijuan Lu, Dinei Florencio, Cha Zhang, Zhoujun Li, Furu Wei
Existing approaches for text recognition are usually built based on CNN for image understanding and RNN for char-level text generation.
1 code implementation • EMNLP 2021 • Bo Zheng, Li Dong, Shaohan Huang, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
We find that many languages are under-represented in recent cross-lingual language models due to the limited vocabulary capacity.
no code implementations • 8 Sep 2021 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei
In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training.
1 code implementation • EMNLP 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei
Recent studies on compression of pretrained language models (e. g., BERT) usually use preserved accuracy as the metric for evaluation.
1 code implementation • EMNLP 2021 • Zilong Wang, Yiheng Xu, Lei Cui, Jingbo Shang, Furu Wei
Reading order detection is the cornerstone to understanding visually-rich documents (e. g., receipts and forms).
no code implementations • ACL 2021 • Jian Yang, Yuwei Yin, Shuming Ma, Haoyang Huang, Dongdong Zhang, Zhoujun Li, Furu Wei
Although multilingual neural machine translation (MNMT) enables multiple language translations, the training process is based on independent multilingual objectives.
no code implementations • ACL 2021 • Nan Yang, Furu Wei, Binxing Jiao, Daxing Jiang, Linjun Yang
Dense passage retrieval has been shown to be an effective approach for information retrieval tasks such as open domain question answering.
no code implementations • ACL 2021 • Shuo Ren, Long Zhou, Shujie Liu, Furu Wei, Ming Zhou, Shuai Ma
While pre-training techniques are working very well in natural language processing, how to pre-train a decoder and effectively use it for neural machine translation (NMT) still remains a tricky issue.
no code implementations • 12 Jul 2021 • Chengyi Wang, Yu Wu, Shujie Liu, Jinyu Li, Yao Qian, Kenichi Kumatani, Furu Wei
Recently, there has been a vast interest in self-supervised learning (SSL) where the model is pre-trained on large scale unlabeled data and then fine-tuned on a small labeled dataset.
2 code implementations • ACL 2022 • Zewen Chi, Shaohan Huang, Li Dong, Shuming Ma, Bo Zheng, Saksham Singhal, Payal Bajaj, Xia Song, Xian-Ling Mao, Heyan Huang, Furu Wei
In this paper, we introduce ELECTRA-style tasks to cross-lingual language model pre-training.
Ranked #1 on
Zero-Shot Cross-Lingual Transfer
on XTREME
no code implementations • Findings (ACL) 2021 • Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei
Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements.
1 code implementation • 25 Jun 2021 • Shuming Ma, Li Dong, Shaohan Huang, Dongdong Zhang, Alexandre Muzio, Saksham Singhal, Hany Hassan Awadalla, Xia Song, Furu Wei
While pretrained encoders have achieved success in various natural language understanding (NLU) tasks, there is a gap between these pretrained encoders and natural language generation (NLG).
no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei
In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.
1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.
9 code implementations • ICLR 2022 • Hangbo Bao, Li Dong, Furu Wei
We first "tokenize" the original image into visual tokens.
Ranked #3 on
Document Layout Analysis
on PubLayNet val
1 code implementation • ACL 2021 • Zewen Chi, Li Dong, Bo Zheng, Shaohan Huang, Xian-Ling Mao, Heyan Huang, Furu Wei
The cross-lingual language models are typically pretrained with masked language modeling on multilingual text or parallel sentences.
1 code implementation • 10 Jun 2021 • Tengchao Lv, Lei Cui, Momcilo Vasilijevic, Furu Wei
Video transcript summarization is a fundamental task for video understanding.
1 code implementation • ACL 2021 • Xin Sun, Tao Ge, Furu Wei, Houfeng Wang
In this paper, we propose Shallow Aggressive Decoding (SAD) to improve the online inference efficiency of the Transformer for instantaneous Grammatical Error Correction (GEC).
1 code implementation • ACL 2022 • Shengqiang Zhang, Xingxing Zhang, Hangbo Bao, Furu Wei
In this paper, we find simply manipulating attention temperatures in Transformers can make pseudo labels easier to learn for student models.
no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen
To this end, we propose a multi-split reversible network and combine it with DARTS.
3 code implementations • ACL 2022 • Damai Dai, Li Dong, Yaru Hao, Zhifang Sui, Baobao Chang, Furu Wei
In this paper, we present preliminary studies on how factual knowledge is stored in pretrained Transformers by introducing the concept of knowledge neurons.
4 code implementations • 18 Apr 2021 • Yiheng Xu, Tengchao Lv, Lei Cui, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Furu Wei
In this paper, we present LayoutXLM, a multimodal pre-trained model for multilingual document understanding, which aims to bridge the language barriers for visually-rich document understanding.
Ranked #10 on
Document Image Classification
on RVL-CDIP
1 code implementation • EMNLP 2021 • Zewen Chi, Li Dong, Shuming Ma, Shaohan Huang Xian-Ling Mao, Heyan Huang, Furu Wei
Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive monolingual texts, which has shown promising results on many cross-lingual tasks.
1 code implementation • EMNLP 2021 • Guanhua Chen, Shuming Ma, Yun Chen, Li Dong, Dongdong Zhang, Jia Pan, Wenping Wang, Furu Wei
In this paper, we focus on a zero-shot cross-lingual transfer task in NMT.
1 code implementation • NAACL 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei
Cant is important for understanding advertising, comedies and dog-whistle politics.
2 code implementations • 19 Jan 2021 • Chengyi Wang, Yu Wu, Yao Qian, Kenichi Kumatani, Shujie Liu, Furu Wei, Michael Zeng, Xuedong Huang
In this paper, we propose a unified pre-training approach called UniSpeech to learn speech representations with both unlabeled and labeled data, in which supervised phonetic CTC learning and phonetically-aware contrastive self-supervised learning are conducted in a multi-task learning manner.
1 code implementation • EMNLP 2021 • Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, Furu Wei
In this paper, we generalize text infilling (e. g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.
no code implementations • 31 Dec 2020 • Shuming Ma, Jian Yang, Haoyang Huang, Zewen Chi, Li Dong, Dongdong Zhang, Hany Hassan Awadalla, Alexandre Muzio, Akiko Eriguchi, Saksham Singhal, Xia Song, Arul Menezes, Furu Wei
Multilingual machine translation enables a single model to translate between different languages.
1 code implementation • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.
4 code implementations • ACL 2021 • Yang Xu, Yiheng Xu, Tengchao Lv, Lei Cui, Furu Wei, Guoxin Wang, Yijuan Lu, Dinei Florencio, Cha Zhang, Wanxiang Che, Min Zhang, Lidong Zhou
Pre-training of text and layout has proved effective in a variety of visually-rich document understanding tasks due to its effective model architecture and the advantage of large-scale unlabeled scanned/digital-born documents.
Ranked #1 on
Key information extraction
on SROIE
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Canwen Xu, Tao Ge, Chenliang Li, Furu Wei
Chinese and Japanese share many characters with similar surface morphology.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The recently introduced pre-trained language model BERT advances the state-of-the-art on many NLP tasks through the fine-tuning approach, but few studies investigate how the fine-tuning process improves the model performance on downstream tasks.
no code implementations • COLING 2020 • Shaohan Huang, Furu Wei, Lei Cui, Xingxing Zhang, Ming Zhou
Fine-tuning with pre-trained language models (e. g. BERT) has achieved great success in many language understanding tasks in supervised settings (e. g. text classification).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shusheng Xu, Xingxing Zhang, Yi Wu, Furu Wei, Ming Zhou
We also find in experiments that our model is less dependent on sentence positions.
no code implementations • EMNLP 2020 • Mengyun Chen, Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection (ESD) and Erroneous Span Correction (ESC).
1 code implementation • EMNLP 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Xiaoyan Zhu, Minlie Huang
Despite the success of generative pre-trained language models on a series of text generation tasks, they still suffer in cases where reasoning over underlying commonsense knowledge is required during generation.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Haozhe Ji, Pei Ke, Shaohan Huang, Furu Wei, Minlie Huang
Commonsense explanation generation aims to empower the machine's sense-making capability by generating plausible explanations to statements against commonsense.
2 code implementations • NAACL 2021 • Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.
Ranked #12 on
Zero-Shot Cross-Lingual Transfer
on XTREME
1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei
In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM).
1 code implementation • COLING 2020 • Minghao Li, Yiheng Xu, Lei Cui, Shaohan Huang, Furu Wei, Zhoujun Li, Ming Zhou
DocBank is constructed using a simple yet effective way with weak supervision from the \LaTeX{} documents available on the arXiv. com.
1 code implementation • ACL 2020 • Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu
Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.
Ranked #185 on
Question Answering
on SQuAD1.1
1 code implementation • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks.
2 code implementations • 23 Apr 2020 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
The great success of Transformer-based models benefits from the powerful multi-head self-attention mechanism, which learns token dependencies and encodes contextual information from the input.
4 code implementations • ECCV 2020 • Xiujun Li, Xi Yin, Chunyuan Li, Pengchuan Zhang, Xiao-Wei Hu, Lei Zhang, Lijuan Wang, Houdong Hu, Li Dong, Furu Wei, Yejin Choi, Jianfeng Gao
Large-scale pre-training methods of learning cross-modal representations on image-text pairs are becoming popular for vision-language tasks.
Ranked #1 on
Text-Image Retrieval
on COCO (image as query)
no code implementations • COLING 2020 • Qingyu Zhou, Furu Wei, Ming Zhou
In this work, we show that unnecessity and redundancy issues exist when extracting full sentences, and extracting sub-sentential units is a promising alternative.
no code implementations • 6 Apr 2020 • Qingyu Zhou, Furu Wei, Ming Zhou
In this paper, we propose a method for automatically constructing a passage-to-summary dataset by mining the Wikipedia page revision histories.
no code implementations • EMNLP 2020 • Yanyan Zou, Xingxing Zhang, Wei Lu, Furu Wei, Ming Zhou
The main idea is that, given an input text artificially constructed from a document, a model is pre-trained to reinstate the original document.
2 code implementations • 28 Feb 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
Ranked #3 on
Question Generation
on SQuAD1.1
(using extra training data)
1 code implementation • NeurIPS 2020 • Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou
The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).
Ranked #5 on
Zero-shot Text Search
on BEIR
1 code implementation • EMNLP 2020 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou
Our approach first divides the original BERT into several modules and builds their compact substitutes.
no code implementations • 7 Feb 2020 • Chaoqun Duan, Lei Cui, Shuming Ma, Furu Wei, Conghui Zhu, Tiejun Zhao
In this work, we aim to improve the relevance between live comments and videos by modeling the cross-modal interactions among different modalities.
no code implementations • ICLR 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples.
no code implementations • 16 Jan 2020 • Yinuo Guo, Tao Ge, Furu Wei
To overcome the challenges, we first propose the Fact-aware Sentence Encoding, which enables the model to learn facts from the long sentence and thus improves the precision of sentence split; then we introduce Permutation Invariant Training to alleviate the effects of order variance in seq2seq learning for this task.
12 code implementations • 31 Dec 2019 • Yiheng Xu, Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou
In this paper, we propose the \textbf{LayoutLM} to jointly model interactions between text and layout information across scanned document images, which is beneficial for a great number of real-world document image understanding tasks such as information extraction from scanned documents.
Ranked #11 on
Document Image Classification
on RVL-CDIP
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Zewen Chi, Li Dong, Furu Wei, Xian-Ling Mao, He-Yan Huang
Multilingual pretrained language models (such as multilingual BERT) have achieved impressive results for cross-lingual transfer.
no code implementations • 8 Nov 2019 • Haichao Zhu, Li Dong, Furu Wei, Bing Qin, Ting Liu
The manual construction of a query-focused summarization corpus is costly and timeconsuming.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou
The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations.
no code implementations • IJCNLP 2019 • Weike Jin, Zhou Zhao, Mao Gu, Jun Xiao, Furu Wei, Yueting Zhuang
Video dialog is a new and challenging task, which requires the agent to answer questions combining video information with dialog history.
no code implementations • WS 2019 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Lei Cui, Songhao Piao, Ming Zhou
Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures.
1 code implementation • 23 Sep 2019 • Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang
In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.
no code implementations • 13 Sep 2019 • Yi Zhang, Tao Ge, Furu Wei, Ming Zhou, Xu sun
We study sequence-to-sequence (seq2seq) pre-training with data augmentation for sentence rewriting.
3 code implementations • ICLR 2020 • Weijie Su, Xizhou Zhu, Yue Cao, Bin Li, Lewei Lu, Furu Wei, Jifeng Dai
We introduce a new pre-trainable generic representation for visual-linguistic tasks, called Visual-Linguistic BERT (VL-BERT for short).
Ranked #1 on
Visual Question Answering
on VCR (Q-A) dev
no code implementations • IJCNLP 2019 • Yaru Hao, Li Dong, Furu Wei, Ke Xu
Language model pre-training, such as BERT, has achieved remarkable results in many NLP tasks.
no code implementations • ACL 2019 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
Our approach first applies dropout to the target word{'}s embedding for partially masking the word, allowing BERT to take balanced consideration of the target word{'}s semantics and contexts for proposing substitute candidates, and then validates the candidates based on their substitution{'}s influence on the global contextualized representation of the sentence.
no code implementations • ACL 2019 • Tao Ge, Xingxing Zhang, Furu Wei, Ming Zhou
Sequence-to-sequence (seq2seq) models have achieved tremendous success in text generation tasks.
no code implementations • ACL 2019 • Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu
We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.
no code implementations • ACL 2019 • Xingxing Zhang, Furu Wei, Ming Zhou
Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods.
Ranked #7 on
Extractive Text Summarization
on CNN / Daily Mail
8 code implementations • NeurIPS 2019 • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.
Ranked #2 on
Generative Question Answering
on CoQA
(using extra training data)
no code implementations • 15 Mar 2019 • Ruochen Xu, Tao Ge, Furu Wei
Its challenge is the lack of large-scale sentence-aligned parallel data.
2 code implementations • LREC 2020 • Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhoujun Li
We present TableBank, a new image-based table detection and recognition dataset built with novel weak supervision from Word and Latex documents on the internet.
no code implementations • EMNLP 2018 • Tao Ge, Qing Dou, Heng Ji, Lei Cui, Baobao Chang, Zhifang Sui, Furu Wei, Ming Zhou
This paper proposes to study fine-grained coordinated cross-lingual text stream alignment through a novel information network decipherment paradigm.
no code implementations • 30 Sep 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou
In this paper, we introduce a novel natural language generation task, termed as text morphing, which targets at generating the intermediate sentences that are fluency and smooth with the two input sentences.
no code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Furu Wei, Xu sun
To fully exploit the unpaired data, we completely remove the need for parallel data and propose a novel unsupervised approach to train an automatic article commenting model, relying on nothing but unpaired articles and comments.
3 code implementations • 13 Sep 2018 • Shuming Ma, Lei Cui, Damai Dai, Furu Wei, Xu sun
We introduce the task of automatic live commenting.
no code implementations • 12 Sep 2018 • Hangbo Bao, Shaohan Huang, Furu Wei, Lei Cui, Yu Wu, Chuanqi Tan, Songhao Piao, Ming Zhou
In this paper, we study a novel task that learns to compose music from natural language.
no code implementations • ACL 2019 • Qingfu Zhu, Lei Cui, Wei-Nan Zhang, Furu Wei, Ting Liu
Dialogue systems are usually built on either generation-based or retrieval-based approaches, yet they do not benefit from the advantages of different models.
no code implementations • EMNLP 2018 • Minghao Hu, Yuxing Peng, Furu Wei, Zhen Huang, Dongsheng Li, Nan Yang, Ming Zhou
Despite that current reading comprehension systems have achieved significant advancements, their promising performances are often obtained at the cost of making an ensemble of numerous models.
no code implementations • EMNLP 2018 • Xingxing Zhang, Mirella Lapata, Furu Wei, Ming Zhou
Extractive summarization models require sentence-level labels, which are usually created heuristically (e. g., with rule-based methods) given that most summarization datasets only have document-summary pairs.
Ranked #10 on
Extractive Text Summarization
on CNN / Daily Mail
no code implementations • 17 Aug 2018 • Minghao Hu, Furu Wei, Yuxing Peng, Zhen Huang, Nan Yang, Dongsheng Li
Machine reading comprehension with unanswerable questions aims to abstain from answering when no answer can be inferred.
Ranked #12 on
Question Answering
on SQuAD2.0 dev
1 code implementation • 6 Jul 2018 • Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou
Copying mechanism shows effectiveness in sequence-to-sequence based neural network models for text generation tasks, such as abstractive sentence summarization and question generation.
1 code implementation • ACL 2018 • Qingyu Zhou, Nan Yang, Furu Wei, Shaohan Huang, Ming Zhou, Tiejun Zhao
In this paper, we present a novel end-to-end neural network framework for extractive document summarization by jointly learning to score and select sentences.
Ranked #8 on
Extractive Text Summarization
on CNN / Daily Mail
1 code implementation • 3 Jul 2018 • Tao Ge, Furu Wei, Ming Zhou
Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC).
Ranked #1 on
Grammatical Error Correction
on Unrestricted
1 code implementation • IJCAI 2018 • Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou
Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection.
Ranked #11 on
Paraphrase Identification
on Quora Question Pairs
(Accuracy metric)
no code implementations • ACL 2018 • Tao Ge, Furu Wei, Ming Zhou
Most of the neural sequence-to-sequence (seq2seq) models for grammatical error correction (GEC) have two limitations: (1) a seq2seq model may not be well generalized with only limited error-corrected data; (2) a seq2seq model may fail to completely correct a sentence with multiple errors through normal seq2seq inference.
no code implementations • ACL 2018 • Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei
Most previous seq2seq summarization systems purely depend on the source text to generate summaries, which tends to work unstably.
Ranked #19 on
Text Summarization
on GigaWord
no code implementations • 21 Jun 2018 • Shaohan Huang, Yu Wu, Furu Wei, Ming Zhou
An intuitive way for a human to write paraphrase sentences is to replace words or phrases in the original sentence with their corresponding synonyms and make necessary changes to ensure the new sentences are fluent and grammatically correct.
3 code implementations • 19 Jun 2018 • Yu Wu, Furu Wei, Shaohan Huang, Yunli Wang, Zhoujun Li, Ming Zhou
Open domain response generation has achieved remarkable progress in recent years, but sometimes yields short and uninformative responses.
no code implementations • ACL 2018 • Lei Cui, Furu Wei, Ming Zhou
Conventional Open Information Extraction (Open IE) systems are usually built on hand-crafted patterns from other NLP tools such as syntactic parsing, yet they face problems of error propagation.
no code implementations • 10 May 2018 • Furu Wei
Existing research on response generation for chatbot focuses on \textbf{First Response Generation} which aims to teach the chatbot to say the first response (e. g. a sentence) appropriate to the conversation context (e. g. the user's query).
no code implementations • 13 Nov 2017 • Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li
While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system.
Ranked #18 on
Text Summarization
on GigaWord
no code implementations • ACL 2017 • Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, Ming Zhou
We first match the question and passage with gated attention-based recurrent networks to obtain the question-aware passage representation.
Ranked #36 on
Question Answering
on SQuAD1.1 dev
no code implementations • 15 Jun 2017 • Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, Weifeng Lv, Ming Zhou
We build the answer extraction model with state-of-the-art neural networks for single passage reading comprehension, and propose an additional task of passage ranking to help answer extraction in multiple passages.
3 code implementations • 8 May 2017 • Minghao Hu, Yuxing Peng, Zhen Huang, Xipeng Qiu, Furu Wei, Ming Zhou
In this paper, we introduce the Reinforced Mnemonic Reader for machine reading comprehension tasks, which enhances previous attentive readers in two aspects.
Ranked #17 on
Question Answering
on TriviaQA
2 code implementations • ACL 2017 • Qingyu Zhou, Nan Yang, Furu Wei, Ming Zhou
We propose a selective encoding model to extend the sequence-to-sequence framework for abstractive sentence summarization.
Ranked #8 on
Text Summarization
on DUC 2004 Task 1
no code implementations • EMNLP 2017 • Chuanqi Tan, Furu Wei, Pengjie Ren, Weifeng Lv, Ming Zhou
The key idea is to search sentences similar to a query from Wikipedia articles and directly use the human-annotated entities in the similar sentences as candidate entities for the query.
4 code implementations • 6 Apr 2017 • Qingyu Zhou, Nan Yang, Furu Wei, Chuanqi Tan, Hangbo Bao, Ming Zhou
Automatic question generation aims to generate questions from a text passage where the generated questions can be answered by certain sub-spans of the given passage.
Ranked #10 on
Question Generation
on SQuAD1.1
no code implementations • EACL 2017 • Li Dong, Shaohan Huang, Furu Wei, Mirella Lapata, Ming Zhou, Ke Xu
This paper presents an attention-enhanced attribute-to-sequence model to generate product reviews for given attribute information, such as user, product, and rating.
no code implementations • COLING 2016 • Pengjie Ren, Furu Wei, Zhumin Chen, Jun Ma, Ming Zhou
Existing sentence regression methods for extractive summarization usually model sentence importance and redundancy in two separate processes.
no code implementations • 28 Nov 2016 • Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei
Developed so far, multi-document summarization has reached its bottleneck due to the lack of sufficient training data and diverse categories of documents.
no code implementations • 25 May 2016 • Yichun Yin, Furu Wei, Li Dong, Kaimeng Xu, Ming Zhang, Ming Zhou
In this paper, we develop a novel approach to aspect term extraction based on unsupervised learning of distributed representations of words and dependency paths.
no code implementations • COLING 2016 • Ziqiang Cao, Wenjie Li, Sujian Li, Furu Wei, Yan-ran Li
Query relevance ranking and sentence saliency ranking are the two main tasks in extractive query-focused summarization.
no code implementations • 26 Nov 2015 • Ziqiang Cao, Chengyao Chen, Wenjie Li, Sujian Li, Furu Wei, Ming Zhou
Both informativeness and readability of the collected summaries are verified by manual judgment.
no code implementations • IJCNLP 2015 • Yang Liu, Furu Wei, Sujian Li, Heng Ji, Ming Zhou, Houfeng Wang
Previous research on relation classification has verified the effectiveness of using dependency shortest paths or subtrees.
Ranked #5 on
Relation Classification
on SemEval 2010 Task 8
no code implementations • 8 Jul 2015 • Xiaojun Wan, Ziqiang Cao, Furu Wei, Sujian Li, Ming Zhou
However, according to our quantitative analysis, none of the existing summarization models can always produce high-quality summaries for different document sets, and even a summarization model with good overall performance may produce low-quality summaries for some document sets.