Search Results for author: Wangchunshu Zhou

Found 36 papers, 21 papers with code

Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

no code implementations4 Jun 2023 Wangchunshu Zhou, Ronan Le Bras, Yejin Choi

Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation.

Knowledge Distillation Neural Network Compression +2

Commonsense Knowledge Transfer for Pre-trained Language Models

no code implementations4 Jun 2023 Wangchunshu Zhou, Ronan Le Bras, Yejin Choi

In this work, we introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.

Language Modelling Transfer Learning

SmartTrim: Adaptive Tokens and Parameters Pruning for Efficient Vision-Language Models

no code implementations24 May 2023 Zekun Wang, Jingchang Chen, Wangchunshu Zhou, Ming Liu, Bing Qin

Experimental results demonstrate that SmartTrim significantly reduces the computation overhead (2-3 times) of various VLMs with comparable performance (only a 1-2% degradation) on various vision-language tasks.

Data Augmentation

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

1 code implementation22 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan

In addition to producing AI-generated content (AIGC), we also demonstrate the possibility of using RecurrentGPT as an interactive fiction that directly interacts with consumers.

Language Modelling

To Repeat or Not To Repeat: Insights from Scaling LLM under Token-Crisis

no code implementations22 May 2023 Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng, Yang You

First, we explore the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting, leading to multi-epoch degradation.

Interactive Natural Language Processing

no code implementations22 May 2023 Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.

Decision Making

Efficient Prompting via Dynamic In-Context Learning

no code implementations18 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

To achieve this, we train a meta controller that predicts the number of in-context examples suitable for the generalist model to make a good prediction based on the performance-efficiency trade-off for a specific input.

Controlled Text Generation with Natural Language Instructions

1 code implementation27 Apr 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan

Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training.

Language Modelling Text Generation

X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

1 code implementation22 Nov 2022 Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou

Moreover, we show that the modular design of X$^2$-VLM results in high transferability for X$^2$-VLM to be utilized in any language or domain.

Cross-Modal Retrieval Image Captioning +7

Efficiently Tuned Parameters are Task Embeddings

1 code implementation21 Oct 2022 Wangchunshu Zhou, Canwen Xu, Julian McAuley

Thus, we propose to exploit these efficiently tuned parameters as off-the-shelf task embeddings for the efficient selection of source datasets for intermediate-task transfer.

Question Answering Text Classification

Write and Paint: Generative Vision-Language Models are Unified Modal Learners

1 code implementation15 Jun 2022 Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang

In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.

Language Modelling Text Generation +1

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

1 code implementation1 Jun 2022 Yan Zeng, Wangchunshu Zhou, Ao Luo, Xinsong Zhang

To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.

Contrastive Learning Language Modelling +3

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

1 code implementation30 May 2022 Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang

We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.

Learning to Predict Persona Information forDialogue Personalization without Explicit Persona Description

no code implementations30 Nov 2021 Wangchunshu Zhou, Qifei Li, Chenle Li

Personalizing dialogue agents is important for dialogue systems to generate more specific, consistent, and engaging responses.

A Survey on Green Deep Learning

no code implementations8 Nov 2021 Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei LI

In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV).

Knowledge Distillation Model Compression

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

1 code implementation EMNLP 2021 Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei

Recent studies on compression of pretrained language models (e. g., BERT) usually use preserved accuracy as the metric for evaluation.

Knowledge Distillation Quantization

BERT Learns to Teach: Knowledge Distillation with Meta Learning

1 code implementation ACL 2022 Wangchunshu Zhou, Canwen Xu, Julian McAuley

We present Knowledge Distillation with Meta Learning (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training.

Knowledge Distillation Meta-Learning

Learning from Perturbations: Diverse and Informative Dialogue Generation with Inverse Adversarial Training

no code implementations ACL 2021 Wangchunshu Zhou, Qifei Li, Chenle Li

By giving higher rewards for responses whose output probability reduces more significantly when dialogue history is perturbed, the model is encouraged to generate more diverse and consistent responses.

Dialogue Generation Response Generation

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

1 code implementation EMNLP 2021 Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, Furu Wei

In this paper, we generalize text infilling (e. g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.

Text Infilling

Pre-training Text-to-Text Transformers to Write and Reason with Concepts

no code implementations ICLR 2021 Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Xiang Ren

To augment PTLMs with common sense, we propose generative and contrastive objectives as intermediate self-supervised pre-training tasks between general pre-training and downstream task-specific fine-tuning.

Common Sense Reasoning Language Modelling +2

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

1 code implementation24 Oct 2020 Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, Xiang Ren

Pre-trained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks.

Common Sense Reasoning Knowledge Graphs +3

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.

Explanation Generation Natural Language Understanding

Connecting the Dots Between Fact Verification and Fake News Detection

no code implementations COLING 2020 Qifei Li, Wangchunshu Zhou

Fact verification models have enjoyed a fast advancement in the last two years with the development of pre-trained language models like BERT and the release of large scale datasets such as FEVER.

Fact Verification Fake News Detection +1

BERT Loses Patience: Fast and Robust Inference with Early Exit

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei

In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM).

Language Modelling

Scheduled DropHead: A Regularization Method for Transformer Models

1 code implementation Findings of the Association for Computational Linguistics 2020 Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks.

Machine Translation text-classification +2

Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models

no code implementations12 Feb 2020 Wangchunshu Zhou, Ke Xu

While able to be trained in a fully self-supervised fashion, our model can be further fine-tuned with a little amount of human preference annotation to better imitate human judgment.

Natural Language Understanding Response Generation +1

Pseudo-Bidirectional Decoding for Local Sequence Transduction

no code implementations Findings of the Association for Computational Linguistics 2020 Wangchunshu Zhou, Tao Ge, Ke Xu

PBD copies the corresponding representation of source tokens to the decoder as pseudo future context to enable the decoder to attends to its bi-directional context.

Grammatical Error Correction Inductive Bias +1

Self-Adversarial Learning with Comparative Discrimination for Text Generation

no code implementations ICLR 2020 Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples.

Text Generation

Improving Grammatical Error Correction with Machine Translation Pairs

1 code implementation Findings of the Association for Computational Linguistics 2020 Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou

The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations.

Grammatical Error Correction Language Modelling +2

BERT-based Lexical Substitution

1 code implementation ACL 2019 Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Our approach first applies dropout to the target word{'}s embedding for partially masking the word, allowing BERT to take balanced consideration of the target word{'}s semantics and contexts for proposing substitute candidates, and then validates the candidates based on their substitution{'}s influence on the global contextualized representation of the sentence.

Cannot find the paper you are looking for? You can Submit a new open access paper.