no code implementations • 4 Jun 2023 • Wangchunshu Zhou, Ronan Le Bras, Yejin Choi
Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation.
no code implementations • 4 Jun 2023 • Wangchunshu Zhou, Ronan Le Bras, Yejin Choi
In this work, we introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.
no code implementations • 24 May 2023 • Zekun Wang, Jingchang Chen, Wangchunshu Zhou, Ming Liu, Bing Qin
Experimental results demonstrate that SmartTrim significantly reduces the computation overhead (2-3 times) of various VLMs with comparable performance (only a 1-2% degradation) on various vision-language tasks.
1 code implementation • 22 May 2023 • Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan
In addition to producing AI-generated content (AIGC), we also demonstrate the possibility of using RecurrentGPT as an interactive fiction that directly interacts with consumers.
no code implementations • 22 May 2023 • Fuzhao Xue, Yao Fu, Wangchunshu Zhou, Zangwei Zheng, Yang You
First, we explore the consequences of repeating pre-training data, revealing that the model is susceptible to overfitting, leading to multi-epoch degradation.
no code implementations • 22 May 2023 • Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu
Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.
no code implementations • 18 May 2023 • Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan
To achieve this, we train a meta controller that predicts the number of in-context examples suitable for the generalist model to make a good prediction based on the performance-efficiency trade-off for a specific input.
1 code implementation • 27 Apr 2023 • Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan
Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training.
1 code implementation • 21 Jan 2023 • Vilém Zouhar, Shehzaad Dhuliawala, Wangchunshu Zhou, Nico Daheim, Tom Kocmi, Yuchen Eleanor Jiang, Mrinmaya Sachan
Machine translation quality estimation (QE) predicts human judgements of a translation hypothesis without seeing the reference.
1 code implementation • 22 Nov 2022 • Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou
Moreover, we show that the modular design of X$^2$-VLM results in high transferability for X$^2$-VLM to be utilized in any language or domain.
Ranked #1 on
Cross-Modal Retrieval
on Flickr30k
1 code implementation • 21 Oct 2022 • Wangchunshu Zhou, Canwen Xu, Julian McAuley
Thus, we propose to exploit these efficiently tuned parameters as off-the-shelf task embeddings for the efficient selection of source datasets for intermediate-task transfer.
no code implementations • 14 Oct 2022 • Tiannan Wang, Wangchunshu Zhou, Yan Zeng, Xinsong Zhang
Pre-trained vision-language models (VLMs) have achieved impressive results in a range of vision-language tasks.
1 code implementation • 15 Jun 2022 • Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang
In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.
1 code implementation • 1 Jun 2022 • Yan Zeng, Wangchunshu Zhou, Ao Luo, Xinsong Zhang
To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.
1 code implementation • 30 May 2022 • Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang
We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.
1 code implementation • ACL 2022 • Zhiyi Fu, Wangchunshu Zhou, Jingjing Xu, Hao Zhou, Lei LI
How do masked language models (MLMs) such as BERT learn contextual representations?
no code implementations • 30 Nov 2021 • Wangchunshu Zhou, Qifei Li, Chenle Li
Personalizing dialogue agents is important for dialogue systems to generate more specific, consistent, and engaging responses.
no code implementations • 8 Nov 2021 • Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei LI
In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV).
1 code implementation • EMNLP 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei
Recent studies on compression of pretrained language models (e. g., BERT) usually use preserved accuracy as the metric for evaluation.
1 code implementation • ACL 2022 • Wangchunshu Zhou, Canwen Xu, Julian McAuley
We present Knowledge Distillation with Meta Learning (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training.
no code implementations • ACL 2021 • Wangchunshu Zhou, Qifei Li, Chenle Li
By giving higher rewards for responses whose output probability reduces more significantly when dialogue history is perturbed, the model is encouraged to generate more diverse and consistent responses.
1 code implementation • NAACL 2021 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei
Cant is important for understanding advertising, comedies and dog-whistle politics.
1 code implementation • EMNLP 2021 • Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, Furu Wei
In this paper, we generalize text infilling (e. g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.
no code implementations • ICLR 2021 • Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Xiang Ren
To augment PTLMs with common sense, we propose generative and contrastive objectives as intermediate self-supervised pre-training tasks between general pre-training and downstream task-specific fine-tuning.
1 code implementation • 24 Oct 2020 • Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, Xiang Ren
Pre-trained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks.
1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang
In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.
no code implementations • COLING 2020 • Qifei Li, Wangchunshu Zhou
Fact verification models have enjoyed a fast advancement in the last two years with the development of pre-trained language models like BERT and the release of large scale datasets such as FEVER.
1 code implementation • NeurIPS 2020 • Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei
In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks.
no code implementations • 12 Feb 2020 • Wangchunshu Zhou, Ke Xu
While able to be trained in a fully self-supervised fashion, our model can be further fine-tuned with a little amount of human preference annotation to better imitate human judgment.
1 code implementation • EMNLP 2020 • Canwen Xu, Wangchunshu Zhou, Tao Ge, Furu Wei, Ming Zhou
Our approach first divides the original BERT into several modules and builds their compact substitutes.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu
PBD copies the corresponding representation of source tokens to the decoder as pseudo future context to enable the decoder to attends to its bi-directional context.
no code implementations • ICLR 2020 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Bill Yuchen Lin, Wangchunshu Zhou, Ming Shen, Pei Zhou, Chandra Bhagavatula, Yejin Choi, Xiang Ren
In this paper, we present a constrained text generation task, CommonGen associated with a benchmark dataset, to explicitly test machines for the ability of generative commonsense reasoning.
Ranked #1 on
Text Generation
on CommonGen
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou
The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations.
1 code implementation • ACL 2019 • Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou
Our approach first applies dropout to the target word{'}s embedding for partially masking the word, allowing BERT to take balanced consideration of the target word{'}s semantics and contexts for proposing substitute candidates, and then validates the candidates based on their substitution{'}s influence on the global contextualized representation of the sentence.