Search Results for author: Wangchunshu Zhou

Found 51 papers, 33 papers with code

MIMIR: A Streamlined Platform for Personalized Agent Tuning in Domain Expertise

no code implementations3 Apr 2024 Chunyuan Deng, Xiangru Tang, Yilun Zhao, Hanming Wang, Haoran Wang, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

Recently, large language models (LLMs) have evolved into interactive agents, proficient in planning, tool use, and task execution across a wide variety of tasks.

LoraRetriever: Input-Aware LoRA Retrieval and Composition for Mixed Tasks in the Wild

no code implementations15 Feb 2024 Ziyu Zhao, Leilei Gan, Guoyin Wang, Wangchunshu Zhou, Hongxia Yang, Kun Kuang, Fei Wu

Low-Rank Adaptation (LoRA) provides an effective yet efficient solution for fine-tuning large language models (LLM).

Retrieval

Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science

no code implementations6 Feb 2024 Xiangru Tang, Qiao Jin, Kunlun Zhu, Tongxin Yuan, Yichi Zhang, Wangchunshu Zhou, Meng Qu, Yilun Zhao, Jian Tang, Zhuosheng Zhang, Arman Cohan, Zhiyong Lu, Mark Gerstein

Intelligent agents powered by large language models (LLMs) have demonstrated substantial promise in autonomously conducting experiments and facilitating scientific discoveries across various disciplines.

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models

1 code implementation29 Jan 2024 Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, Yang You

To help the open-source community have a better understanding of Mixture-of-Experts (MoE) based large language models (LLMs), we train and release OpenMoE, a series of fully open-sourced and reproducible decoder-only MoE LLMs, ranging from 650M to 34B parameters and trained on up to over 1T tokens.

AUTOACT: Automatic Agent Learning from Scratch via Self-Planning

1 code implementation10 Jan 2024 Shuofei Qiao, Ningyu Zhang, Runnan Fang, Yujie Luo, Wangchunshu Zhou, Yuchen Eleanor Jiang, Chengfei Lv, Huajun Chen

Further analysis demonstrates the effectiveness of the division-of-labor strategy, with the trajectory quality generated by AutoAct significantly outperforming that of others.

Question Answering

How Many Unicorns Are in This Image? A Safety Evaluation Benchmark for Vision LLMs

1 code implementation27 Nov 2023 Haoqin Tu, Chenhang Cui, Zijun Wang, Yiyang Zhou, Bingchen Zhao, Junlin Han, Wangchunshu Zhou, Huaxiu Yao, Cihang Xie

Different from prior studies, we shift our focus from evaluating standard performance to introducing a comprehensive safety evaluation suite, covering both out-of-distribution (OOD) generalization and adversarial robustness.

Adversarial Robustness Visual Question Answering (VQA) +1

ML-Bench: Evaluating Large Language Models for Code Generation in Repository-Level Machine Learning Tasks

1 code implementation16 Nov 2023 Yuliang Liu, Xiangru Tang, Zefan Cai, Junjie Lu, Yichi Zhang, Yanjun Shao, Zexuan Deng, Helan Hu, Kaikai An, Ruijun Huang, Shuzheng Si, Sheng Chen, Haozhe Zhao, Liang Chen, Yan Wang, Tianyu Liu, Zhiwei Jiang, Baobao Chang, Yujia Qin, Wangchunshu Zhou, Yilun Zhao, Arman Cohan, Mark Gerstein

While Large Language Models (LLMs) have demonstrated proficiency in code generation benchmarks, translating these results into practical development scenarios - where leveraging existing repository-level libraries is the norm - remains challenging.

Code Generation Navigate

Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models

1 code implementation23 Oct 2023 Yifan Hou, Jiaoda Li, Yu Fei, Alessandro Stolfo, Wangchunshu Zhou, Guangtao Zeng, Antoine Bosselut, Mrinmaya Sachan

We show that MechanisticProbe is able to detect the information of the reasoning tree from the model's attentions for most examples, suggesting that the LM indeed is going through a process of multi-step reasoning within its architecture in many cases.

Evaluating Large Language Models on Controlled Generation Tasks

1 code implementation23 Oct 2023 Jiao Sun, Yufei Tian, Wangchunshu Zhou, Nan Xu, Qian Hu, Rahul Gupta, John Frederick Wieting, Nanyun Peng, Xuezhe Ma

While recent studies have looked into the abilities of large language models in various benchmark tasks, including question generation, reading comprehension, multilingual and etc, there have been few studies looking into the controllability of large language models on generation tasks.

Question Generation Question-Generation +2

RoleLLM: Benchmarking, Eliciting, and Enhancing Role-Playing Abilities of Large Language Models

1 code implementation1 Oct 2023 Zekun Moore Wang, Zhongyuan Peng, Haoran Que, Jiaheng Liu, Wangchunshu Zhou, Yuhan Wu, Hongcheng Guo, Ruitong Gan, Zehao Ni, Man Zhang, Zhaoxiang Zhang, Wanli Ouyang, Ke Xu, Wenhu Chen, Jie Fu, Junran Peng

The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters.

Benchmarking

Mixup Your Own Pairs

1 code implementation28 Sep 2023 Yilei Wu, Zijian Dong, Chongyao Chen, Wangchunshu Zhou, Juan Helen Zhou

In representation learning, regression has traditionally received less attention than classification.

Contrastive Learning regression +3

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

1 code implementation16 Sep 2023 Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging.

Hallucination

Agents: An Open-source Framework for Autonomous Language Agents

1 code implementation14 Sep 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Long Li, Jialong Wu, Tiannan Wang, Shi Qiu, Jintian Zhang, Jing Chen, Ruipu Wu, Shuai Wang, Shiding Zhu, Jiyu Chen, Wentao Zhang, Xiangru Tang, Ningyu Zhang, Huajun Chen, Peng Cui, Mrinmaya Sachan

Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces.

Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference

no code implementations4 Jun 2023 Wangchunshu Zhou, Ronan Le Bras, Yejin Choi

Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation.

Knowledge Distillation Neural Network Compression +2

Commonsense Knowledge Transfer for Pre-trained Language Models

no code implementations4 Jun 2023 Wangchunshu Zhou, Ronan Le Bras, Yejin Choi

In this work, we introduce commonsense knowledge transfer, a framework to transfer the commonsense knowledge stored in a neural commonsense knowledge model to a general-purpose pre-trained language model.

Language Modelling Transfer Learning

SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models

no code implementations24 May 2023 Zekun Wang, Jingchang Chen, Wangchunshu Zhou, Haichao Zhu, Jiafeng Liang, Liping Shan, Ming Liu, Dongliang Xu, Qing Yang, Bing Qin

Despite achieving remarkable performance on various vision-language tasks, Transformer-based Vision-Language Models (VLMs) suffer from redundancy in inputs and parameters, significantly hampering their efficiency in real-world applications.

Data Augmentation

RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text

2 code implementations22 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, Mrinmaya Sachan

In addition to producing AI-generated content (AIGC), we also demonstrate the possibility of using RecurrentGPT as an interactive fiction that directly interacts with consumers.

Language Modelling Large Language Model

Interactive Natural Language Processing

no code implementations22 May 2023 Zekun Wang, Ge Zhang, Kexin Yang, Ning Shi, Wangchunshu Zhou, Shaochun Hao, Guangzheng Xiong, Yizhi Li, Mong Yuan Sim, Xiuying Chen, Qingqing Zhu, Zhenzhu Yang, Adam Nik, Qi Liu, Chenghua Lin, Shi Wang, Ruibo Liu, Wenhu Chen, Ke Xu, Dayiheng Liu, Yike Guo, Jie Fu

Interactive Natural Language Processing (iNLP) has emerged as a novel paradigm within the field of NLP, aimed at addressing limitations in existing frameworks while aligning with the ultimate goals of artificial intelligence.

Decision Making

Efficient Prompting via Dynamic In-Context Learning

no code implementations18 May 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ryan Cotterell, Mrinmaya Sachan

To achieve this, we train a meta controller that predicts the number of in-context examples suitable for the generalist model to make a good prediction based on the performance-efficiency trade-off for a specific input.

In-Context Learning

Controlled Text Generation with Natural Language Instructions

1 code implementation27 Apr 2023 Wangchunshu Zhou, Yuchen Eleanor Jiang, Ethan Wilcox, Ryan Cotterell, Mrinmaya Sachan

Large language models generate fluent texts and can follow natural language instructions to solve a wide range of tasks without task-specific training.

In-Context Learning Language Modelling +1

X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

2 code implementations22 Nov 2022 Yan Zeng, Xinsong Zhang, Hang Li, Jiawei Wang, Jipeng Zhang, Wangchunshu Zhou

Vision language pre-training aims to learn alignments between vision and language from a large amount of data.

 Ranked #1 on Cross-Modal Retrieval on Flickr30k (using extra training data)

Cross-Modal Retrieval Image Captioning +7

Efficiently Tuned Parameters are Task Embeddings

1 code implementation21 Oct 2022 Wangchunshu Zhou, Canwen Xu, Julian McAuley

Thus, we propose to exploit these efficiently tuned parameters as off-the-shelf task embeddings for the efficient selection of source datasets for intermediate-task transfer.

Question Answering Text Classification

Write and Paint: Generative Vision-Language Models are Unified Modal Learners

1 code implementation15 Jun 2022 Shizhe Diao, Wangchunshu Zhou, Xinsong Zhang, Jiawei Wang

In this work, we disclose the potential of symmetric generative vision-language pre-training in learning to write and paint concurrently, and propose a new unified modal model, named DaVinci, trained with prefix language modeling and prefix image modeling, a simple generative self-supervised objective on image-text pairs.

Language Modelling Text Generation +1

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

1 code implementation1 Jun 2022 Yan Zeng, Wangchunshu Zhou, Ao Luo, Ziming Cheng, Xinsong Zhang

To this end, the cross-view language modeling framework considers both multi-modal data (i. e., image-caption pairs) and multi-lingual data (i. e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning.

Contrastive Learning Language Modelling +9

VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models

1 code implementation30 May 2022 Wangchunshu Zhou, Yan Zeng, Shizhe Diao, Xinsong Zhang

We release the VLUE benchmark to promote research on building vision-language models that generalize well to more diverse images and concepts unseen during pre-training, and are practical in terms of efficiency-performance trade-off.

Vietnamese Language Models Vietnamese Natural Language Understanding +1

Learning to Predict Persona Information forDialogue Personalization without Explicit Persona Description

no code implementations30 Nov 2021 Wangchunshu Zhou, Qifei Li, Chenle Li

Personalizing dialogue agents is important for dialogue systems to generate more specific, consistent, and engaging responses.

A Survey on Green Deep Learning

no code implementations8 Nov 2021 Jingjing Xu, Wangchunshu Zhou, Zhiyi Fu, Hao Zhou, Lei LI

In recent years, larger and deeper models are springing up and continuously pushing state-of-the-art (SOTA) results across various fields like natural language processing (NLP) and computer vision (CV).

Knowledge Distillation Model Compression

Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression

1 code implementation EMNLP 2021 Canwen Xu, Wangchunshu Zhou, Tao Ge, Ke Xu, Julian McAuley, Furu Wei

Recent studies on compression of pretrained language models (e. g., BERT) usually use preserved accuracy as the metric for evaluation.

Knowledge Distillation Quantization

BERT Learns to Teach: Knowledge Distillation with Meta Learning

1 code implementation ACL 2022 Wangchunshu Zhou, Canwen Xu, Julian McAuley

We present Knowledge Distillation with Meta Learning (MetaDistil), a simple yet effective alternative to traditional knowledge distillation (KD) methods where the teacher model is fixed during training.

Knowledge Distillation Meta-Learning

Learning from Perturbations: Diverse and Informative Dialogue Generation with Inverse Adversarial Training

no code implementations ACL 2021 Wangchunshu Zhou, Qifei Li, Chenle Li

By giving higher rewards for responses whose output probability reduces more significantly when dialogue history is perturbed, the model is encouraged to generate more diverse and consistent responses.

Dialogue Generation Response Generation

Improving Sequence-to-Sequence Pre-training via Sequence Span Rewriting

1 code implementation EMNLP 2021 Wangchunshu Zhou, Tao Ge, Canwen Xu, Ke Xu, Furu Wei

In this paper, we generalize text infilling (e. g., masked language models) by proposing Sequence Span Rewriting (SSR) as a self-supervised sequence-to-sequence (seq2seq) pre-training objective.

Sentence Text Infilling

Pre-training Text-to-Text Transformers to Write and Reason with Concepts

no code implementations ICLR 2021 Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Xiang Ren

To augment PTLMs with common sense, we propose generative and contrastive objectives as intermediate self-supervised pre-training tasks between general pre-training and downstream task-specific fine-tuning.

Common Sense Reasoning Language Modelling +2

Pre-training Text-to-Text Transformers for Concept-centric Common Sense

1 code implementation24 Oct 2020 Wangchunshu Zhou, Dong-Ho Lee, Ravi Kiran Selvam, Seyeon Lee, Bill Yuchen Lin, Xiang Ren

Pre-trained language models (PTLM) have achieved impressive results in a range of natural language understanding (NLU) and generation (NLG) tasks.

Common Sense Reasoning Knowledge Graphs +3

Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Jinyi Hu, HANLIN ZHANG, Xiaodan Liang, Maosong Sun, Chenyan Xiong, Jian Tang

In this paper, we develop a general framework for interpretable natural language understanding that requires only a small set of human annotated explanations for training.

Explanation Generation Natural Language Understanding

Connecting the Dots Between Fact Verification and Fake News Detection

no code implementations COLING 2020 Qifei Li, Wangchunshu Zhou

Fact verification models have enjoyed a fast advancement in the last two years with the development of pre-trained language models like BERT and the release of large scale datasets such as FEVER.

Fact Verification Fake News Detection +1

BERT Loses Patience: Fast and Robust Inference with Early Exit

1 code implementation NeurIPS 2020 Wangchunshu Zhou, Canwen Xu, Tao Ge, Julian McAuley, Ke Xu, Furu Wei

In this paper, we propose Patience-based Early Exit, a straightforward yet effective inference method that can be used as a plug-and-play technique to simultaneously improve the efficiency and robustness of a pretrained language model (PLM).

Language Modelling

Scheduled DropHead: A Regularization Method for Transformer Models

1 code implementation Findings of the Association for Computational Linguistics 2020 Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

In this paper, we introduce DropHead, a structured dropout method specifically designed for regularizing the multi-head attention mechanism, which is a key component of transformer, a state-of-the-art model for various NLP tasks.

Machine Translation text-classification +2

Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models

no code implementations12 Feb 2020 Wangchunshu Zhou, Ke Xu

While able to be trained in a fully self-supervised fashion, our model can be further fine-tuned with a little amount of human preference annotation to better imitate human judgment.

Natural Language Understanding Response Generation +1

Self-Adversarial Learning with Comparative Discrimination for Text Generation

no code implementations ICLR 2020 Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Conventional Generative Adversarial Networks (GANs) for text generation tend to have issues of reward sparsity and mode collapse that affect the quality and diversity of generated samples.

Sentence Text Generation

Pseudo-Bidirectional Decoding for Local Sequence Transduction

no code implementations Findings of the Association for Computational Linguistics 2020 Wangchunshu Zhou, Tao Ge, Ke Xu

PBD copies the corresponding representation of source tokens to the decoder as pseudo future context to enable the decoder to attends to its bi-directional context.

Grammatical Error Correction Inductive Bias +1

Improving Grammatical Error Correction with Machine Translation Pairs

1 code implementation Findings of the Association for Computational Linguistics 2020 Wangchunshu Zhou, Tao Ge, Chang Mu, Ke Xu, Furu Wei, Ming Zhou

The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations.

Grammatical Error Correction Language Modelling +3

BERT-based Lexical Substitution

1 code implementation ACL 2019 Wangchunshu Zhou, Tao Ge, Ke Xu, Furu Wei, Ming Zhou

Our approach first applies dropout to the target word{'}s embedding for partially masking the word, allowing BERT to take balanced consideration of the target word{'}s semantics and contexts for proposing substitute candidates, and then validates the candidates based on their substitution{'}s influence on the global contextualized representation of the sentence.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.