Search Results for author: Pengcheng He

Found 59 papers, 40 papers with code

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

no code implementations17 Oct 2023 Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.

Transfer Learning

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

1 code implementation12 Oct 2023 Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning.

Natural Language Understanding Quantization +2

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2 code implementations7 Sep 2023 Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

Deep Reinforcement Learning with Hierarchical Reward Modeling

1 code implementation6 Sep 2023 Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao

Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals.

reinforcement-learning Reinforcement Learning (RL)

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

2 code implementations17 Aug 2023 Zekun Li, Baolin Peng, Pengcheng He, Xifeng Yan

In this work, we establish a benchmark to evaluate the robustness of instruction-following LLMs against prompt injection attacks.

Instruction Following

Interactive Editing for Text Summarization

1 code implementation5 Jun 2023 Yujia Xie, Xun Wang, Si-Qing Chen, Wayne Xiong, Pengcheng He

Summarizing lengthy documents is a common and essential task in our daily lives.

Text Summarization

Query Rewriting for Retrieval-Augmented Large Language Models

no code implementations23 May 2023 Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Furthermore, to better align the query to the frozen modules, we propose a trainable scheme for our pipeline.

Language Modelling Multiple-choice +1

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

1 code implementation11 May 2023 Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance.

Abstractive Text Summarization

Summarization with Precise Length Control

no code implementations9 May 2023 Lesly Miculicich, Yujia Xie, Song Wang, Pengcheng He

Many applications of text generation such as summarization benefit from accurately controlling the text length.

Text Generation

Personalized Abstractive Summarization by Tri-agent Generation Pipeline

1 code implementation4 May 2023 Wen Xiao, Yujia Xie, Giuseppe Carenini, Pengcheng He

The inference-only large language model (ChatGPT) serves as both the generator and editor, with a smaller model acting as the instructor to guide output generation.

Abstractive Text Summarization Language Modelling +1

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

1 code implementation29 Apr 2023 Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou

Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years.

Image Classification Natural Language Inference +1

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

1 code implementation NeurIPS 2023 Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou

Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.

Instruction Tuning with GPT-4

1 code implementation6 Apr 2023 Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao

Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed.

Instruction Following

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations18 Mar 2023 Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

no code implementations24 Feb 2023 Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e. g., task-oriented dialog and question answering.

Informativeness Open-Domain Question Answering

A Prototype-Oriented Clustering for Domain Shift with Source Privacy

no code implementations8 Feb 2023 Korawat Tanwisuth, Shujian Zhang, Pengcheng He, Mingyuan Zhou

Finally, it refines the target model on the target domain data without guidance from the source model.

Clustering

Attend to the Right Context: A Plug-and-Play Module for Content-Controllable Summarization

1 code implementation21 Dec 2022 Wen Xiao, Lesly Miculicich, Yang Liu, Pengcheng He, Giuseppe Carenini

Content-Controllable Summarization generates summaries focused on the given controlling signals.

DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

no code implementations20 Dec 2022 Yu Li, Baolin Peng, Pengcheng He, Michel Galley, Zhou Yu, Jianfeng Gao

In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain.

Momentum Calibration for Text Generation

no code implementations8 Dec 2022 Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.

Abstractive Text Summarization Text Generation

HyperTuning: Toward Adapting Large Language Models without Back-propagation

no code implementations22 Nov 2022 Jason Phang, Yi Mao, Pengcheng He, Weizhu Chen

Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization.

Language Modelling

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation4 Oct 2022 Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modelling Model Compression

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation ACL 2022 Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders

1 code implementation19 Feb 2022 Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.

Text-to-Image Generation

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

2 code implementations14 Feb 2022 Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

2 code implementations6 Dec 2021 Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang

In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.

 Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

2 code implementations18 Nov 2021 Pengcheng He, Jianfeng Gao, Weizhu Chen

We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model.

Natural Language Inference Natural Language Understanding +2

Crossformer: Transformer with Alternated Cross-Layer Guidance

no code implementations29 Sep 2021 Shujian Zhang, Zhibin Duan, Huangjie Zheng, Pengcheng He, Bo Chen, Weizhu Chen, Mingyuan Zhou

Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.

Inductive Bias Machine Translation +3

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation ACL 2021 Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations Findings (EMNLP) 2021 Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation NMT +2

Greedy-Step Off-Policy Reinforcement Learning

no code implementations23 Feb 2021 Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).

Q-Learning reinforcement-learning +1

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation1 Jan 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Natural Questions Open-Domain Question Answering +2

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation ACL 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Natural Questions Open-Domain Question Answering +4

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

9 code implementations ICLR 2021 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.

Common Sense Reasoning Coreference Resolution +10

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

no code implementations EMNLP 2020 Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen

In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text.

Entity Linking Knowledge Base Completion +5

Adversarial Training for Large Neural Language Models

3 code implementations20 Apr 2020 Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao

In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.

Ranked #6 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Natural Language Understanding

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

6 code implementations ACL 2020 Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Linguistic Acceptability Natural Language Inference +4

X-SQL: reinforce schema representation with context

no code implementations21 Aug 2019 Pengcheng He, Yi Mao, Kaushik Chakrabarti, Weizhu Chen

In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query.

On the Variance of the Adaptive Learning Rate and Beyond

21 code implementations ICLR 2020 Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

A Hybrid Neural Network Model for Commonsense Reasoning

3 code implementations WS 2019 Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.

Common Sense Reasoning Coreference Resolution +6

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

3 code implementations20 Apr 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.

Ensemble Learning Knowledge Distillation +5

Multi-Task Deep Neural Networks for Natural Language Understanding

8 code implementations ACL 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.

Domain Adaptation Language Modelling +5

Cannot find the paper you are looking for? You can Submit a new open access paper.