1 code implementation • ICML 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
1 code implementation • 27 Feb 2023 • Shaohan Huang, Li Dong, Wenhui Wang, Yaru Hao, Saksham Singhal, Shuming Ma, Tengchao Lv, Lei Cui, Owais Khan Mohammed, Barun Patra, Qiang Liu, Kriti Aggarwal, Zewen Chi, Johan Bjorck, Vishrav Chaudhary, Subhojit Som, Xia Song, Furu Wei
A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence.
Ranked #2 on
Image Captioning
on Flickr30k Captions test
(CIDEr metric)
1 code implementation • 23 Nov 2022 • Shuming Ma, Hongyu Wang, Shaohan Huang, Wenhui Wang, Zewen Chi, Li Dong, Alon Benhaim, Barun Patra, Vishrav Chaudhary, Xia Song, Furu Wei
Large Transformers have achieved state-of-the-art performance across many tasks.
3 code implementations • 12 Oct 2022 • Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng, Yu Wu, Payal Bajaj, Saksham Singhal, Alon Benhaim, Barun Patra, Zhun Liu, Vishrav Chaudhary, Xia Song, Furu Wei
A big convergence of model architectures across language, vision, speech, and multimodal is emerging.
1 code implementation • 22 Aug 2022 • Wenhui Wang, Hangbo Bao, Li Dong, Johan Bjorck, Zhiliang Peng, Qiang Liu, Kriti Aggarwal, Owais Khan Mohammed, Saksham Singhal, Subhojit Som, Furu Wei
A big convergence of language, vision, and multimodal pretraining is emerging.
Ranked #1 on
Visual Reasoning
on NLVR2 Test
1 code implementation • 13 Jun 2022 • Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei
Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.
Ranked #2 on
Image Captioning
on nocaps val
no code implementations • 2 Jun 2022 • Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei
Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.
no code implementations • CVPR 2022 • Weihua He, Kaichao You, Zhendong Qiao, Xu Jia, Ziyang Zhang, Wenhui Wang, Huchuan Lu, Yaoyuan Wang, Jianxing Liao
Since event camera is a novel sensor, its potential has not been fulfilled due to the lack of processing algorithms.
no code implementations • 29 Jan 2022 • Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao
Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.
2 code implementations • 16 Dec 2021 • Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei
We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.
1 code implementation • 3 Nov 2021 • Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei
We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.
Ranked #3 on
Visual Question Answering (VQA)
on VQA v2 test-dev
1 code implementation • 26 Oct 2021 • Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei
Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.
no code implementations • Findings (ACL) 2021 • Yunzhi Yao, Shaohan Huang, Wenhui Wang, Li Dong, Furu Wei
In this paper, we present a general approach to developing small, fast and effective pre-trained models for specific domains.
1 code implementation • ACL 2021 • Bo Zheng, Li Dong, Shaohan Huang, Wenhui Wang, Zewen Chi, Saksham Singhal, Wanxiang Che, Ting Liu, Xia Song, Furu Wei
Fine-tuning pre-trained cross-lingual language models can transfer task-specific supervision from one language to the others.
2 code implementations • Findings (ACL) 2021 • Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei
We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.
4 code implementations • NAACL 2021 • Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou
In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.
Ranked #16 on
Zero-Shot Cross-Lingual Transfer
on XTREME
1 code implementation • ACL 2020 • Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu
Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.
Ranked #188 on
Question Answering
on SQuAD1.1
1 code implementation • 2 May 2020 • Weihua He, Yujie Wu, Lei Deng, Guoqi Li, Haoyu Wang, Yang Tian, Wei Ding, Wenhui Wang, Yuan Xie
Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion.
Ranked #4 on
Gesture Recognition
on DVS128 Gesture
3 code implementations • 28 Feb 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
Ranked #4 on
Question Generation
on SQuAD1.1
(using extra training data)
1 code implementation • NeurIPS 2020 • Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou
The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).
Ranked #8 on
Zero-shot Text Search
on BEIR
no code implementations • WS 2019 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Lei Cui, Songhao Piao, Ming Zhou
Most machine reading comprehension (MRC) models separately handle encoding and matching with different network architectures.
1 code implementation • 23 Sep 2019 • Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang
In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.
no code implementations • ACL 2019 • Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu
We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.
9 code implementations • NeurIPS 2019 • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.
Ranked #2 on
Generative Question Answering
on CoQA
(using extra training data)
no code implementations • EMNLP 2018 • Wenhui Wang, Baobao Chang, Mairgup Mansur
Pre-trained word embeddings and language model have been shown useful in a lot of tasks.
1 code implementation • IJCAI 2018 • Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou
Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection.
Ranked #11 on
Paraphrase Identification
on Quora Question Pairs
(Accuracy metric)
no code implementations • ACL 2017 • Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, Ming Zhou
We first match the question and passage with gated attention-based recurrent networks to obtain the question-aware passage representation.
Ranked #35 on
Question Answering
on SQuAD1.1 dev