Search Results for author: Wenhui Wang

Found 36 papers, 22 papers with code

Pseudo-Masked Language Models for Unified Language Model Pre-Training

1 code implementation ICML 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Decoder Language Modelling +2

You Only Cache Once: Decoder-Decoder Architectures for Language Models

no code implementations8 May 2024 Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei

We introduce a decoder-decoder architecture, YOCO, for large language models, which only caches key-value pairs once.

Decoder Retrieval

Multi-Head Mixture-of-Experts

no code implementations23 Apr 2024 Xun Wu, Shaohan Huang, Wenhui Wang, Furu Wei

These sub-tokens are then assigned to and processed by a diverse set of experts in parallel, and seamlessly reintegrated into the original token form.

Language Modelling

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

4 code implementations27 Feb 2024 Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, Furu Wei

Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs).

LongNet: Scaling Transformers to 1,000,000,000 Tokens

3 code implementations5 Jul 2023 Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, Nanning Zheng, Furu Wei

Scaling sequence length has become a critical demand in the era of large language models.

Kosmos-2: Grounding Multimodal Large Language Models to the World

2 code implementations26 Jun 2023 Zhiliang Peng, Wenhui Wang, Li Dong, Yaru Hao, Shaohan Huang, Shuming Ma, Furu Wei

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e. g., bounding boxes) and grounding text to the visual world.

Image Captioning In-Context Learning +8

Language Models are General-Purpose Interfaces

1 code implementation13 Jun 2022 Yaru Hao, Haoyu Song, Li Dong, Shaohan Huang, Zewen Chi, Wenhui Wang, Shuming Ma, Furu Wei

Experimental results across various language-only and vision-language benchmarks show that our model outperforms or is competitive with specialized models on finetuning, zero-shot generalization, and few-shot learning.

Causal Language Modeling Few-Shot Learning +6

VL-BEiT: Generative Vision-Language Pretraining

no code implementations2 Jun 2022 Hangbo Bao, Wenhui Wang, Li Dong, Furu Wei

Our minimalist solution conducts masked prediction on both monomodal and multimodal data with a shared Transformer.

Image Classification Language Modelling +7

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

no code implementations29 Jan 2022 Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao

Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.

Inductive Bias Knowledge Distillation +1

Distilled Dual-Encoder Model for Vision-Language Understanding

2 code implementations16 Dec 2021 Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei

We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering.

Question Answering Visual Entailment +2

VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

2 code implementations3 Nov 2021 Hangbo Bao, Wenhui Wang, Li Dong, Qiang Liu, Owais Khan Mohammed, Kriti Aggarwal, Subhojit Som, Furu Wei

We present a unified Vision-Language pretrained Model (VLMo) that jointly learns a dual encoder and a fusion encoder with a modular Transformer network.

Image Retrieval Retrieval +3

s2s-ft: Fine-Tuning Pretrained Transformer Encoders for Sequence-to-Sequence Learning

1 code implementation26 Oct 2021 Hangbo Bao, Li Dong, Wenhui Wang, Nan Yang, Furu Wei

Pretrained bidirectional Transformers, such as BERT, have achieved significant improvements in a wide variety of language understanding tasks, while it is not straightforward to directly apply them for natural language generation.

Abstractive Text Summarization Question Generation +2

MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers

2 code implementations Findings (ACL) 2021 Wenhui Wang, Hangbo Bao, Shaohan Huang, Li Dong, Furu Wei

We generalize deep self-attention distillation in MiniLM (Wang et al., 2020) by only using self-attention relation distillation for task-agnostic compression of pretrained Transformers.

Relation XLM-R

InfoXLM: An Information-Theoretic Framework for Cross-Lingual Language Model Pre-Training

4 code implementations NAACL 2021 Zewen Chi, Li Dong, Furu Wei, Nan Yang, Saksham Singhal, Wenhui Wang, Xia Song, Xian-Ling Mao, He-Yan Huang, Ming Zhou

In this work, we present an information-theoretic framework that formulates cross-lingual language model pre-training as maximizing mutual information between multilingual-multi-granularity texts.

Contrastive Learning Cross-Lingual Transfer +2

Harvesting and Refining Question-Answer Pairs for Unsupervised QA

1 code implementation ACL 2020 Zhongli Li, Wenhui Wang, Li Dong, Furu Wei, Ke Xu

Our approach outperforms previous unsupervised approaches by a large margin and is competitive with early supervised models.

Few-Shot Learning Question Answering

Comparing SNNs and RNNs on Neuromorphic Vision Datasets: Similarities and Differences

1 code implementation2 May 2020 Weihua He, Yujie Wu, Lei Deng, Guoqi Li, Haoyu Wang, Yang Tian, Wei Ding, Wenhui Wang, Yuan Xie

Neuromorphic data, recording frameless spike events, have attracted considerable attention for the spatiotemporal information components and the event-driven processing fashion.

Fairness Gesture Recognition

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

3 code implementations28 Feb 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Decoder +4

MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers

1 code implementation NeurIPS 2020 Wenhui Wang, Furu Wei, Li Dong, Hangbo Bao, Nan Yang, Ming Zhou

The small model (student) is trained by deeply mimicking the self-attention module, which plays a vital role in Transformer networks, of the large model (teacher).

Zero-shot Text Search

Cross-Lingual Natural Language Generation via Pre-Training

1 code implementation23 Sep 2019 Zewen Chi, Li Dong, Furu Wei, Wenhui Wang, Xian-Ling Mao, He-Yan Huang

In this work we focus on transferring supervision signals of natural language generation (NLG) tasks between multiple languages.

Abstractive Text Summarization Decoder +6

Learning to Ask Unanswerable Questions for Machine Reading Comprehension

no code implementations ACL 2019 Haichao Zhu, Li Dong, Furu Wei, Wenhui Wang, Bing Qin, Ting Liu

We also present a way to construct training data for our question generation models by leveraging the existing reading comprehension dataset.

Data Augmentation Machine Reading Comprehension +2

Unified Language Model Pre-training for Natural Language Understanding and Generation

9 code implementations NeurIPS 2019 Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +7

Multiway Attention Networks for Modeling Sentence Pairs

1 code implementation IJCAI 2018 Chuanqi Tan, Furu Wei, Wenhui Wang, Weifeng Lv, Ming Zhou

Modeling sentence pairs plays the vital role for judging the relationship between two sentences, such as paraphrase identification, natural language inference, and answer sentence selection.

Natural Language Inference Paraphrase Identification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.