Search Results for author: Weizhu Chen

Found 71 papers, 44 papers with code

What Makes Good In-Context Examples for GPT-3?

no code implementations DeeLIO (ACL) 2022 Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen

In this work, we investigate whether there are more effective strategies for judiciously selecting in-context examples (relative to random sampling) that better leverage GPT-3’s in-context learning capabilities. Inspired by the recent success of leveraging a retrieval module to augment neural networks, we propose to retrieve examples that are semantically-similar to a test query sample to formulate its corresponding prompt.

Natural Language Understanding Open-Domain Question Answering +1

Finding the Dominant Winning Ticket in Pre-Trained Language Models

no code implementations Findings (ACL) 2022 Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan

Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

no code implementations23 May 2022 Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan

To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.

Question Generation Text Generation

ALLSH: Active Learning Guided by Local Sensitivity and Hardness

1 code implementation10 May 2022 Shujian Zhang, Chengyue Gong, Xingchao Liu, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Active learning, which effectively collects informative unlabeled data for annotation, reduces the demand for labeled data.

Active Learning Few-Shot Learning

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation ACL 2022 Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

no code implementations7 Mar 2022 Shengnan An, Yifei Li, Zeqi Lin, Qian Liu, Bei Chen, Qiang Fu, Weizhu Chen, Nanning Zheng, Jian-Guang Lou

This motivates us to propose input-tuning, which fine-tunes both the continuous prompts and the input representations, leading to a more effective way to adapt unfamiliar inputs to frozen PLMs.

Language Modelling Natural Language Understanding +1

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

1 code implementation7 Mar 2022 Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

Controllable Natural Language Generation with Contrastive Prefixes

no code implementations Findings (ACL) 2022 Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen

We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.

Language Modelling Pretrained Language Models +1

Truncated Diffusion Probabilistic Models

no code implementations19 Feb 2022 Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Employing a forward Markov diffusion chain to gradually map the data to a noise distribution, diffusion probabilistic models learn how to generate the data by inferring a reverse Markov diffusion chain to invert the forward diffusion process.

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

2 code implementations14 Feb 2022 Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.

Reasoning Like Program Executors

no code implementations27 Jan 2022 Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Yan Gao, Qiang Fu, Jian-Guang Lou, Weizhu Chen

Furthermore, POET shines in giant language models, pushing the F1 metric of T5-11B to 87. 6% and achieving a new state-of-the-art performance on DROP.

CodeRetriever: Unimodal and Bimodal Contrastive Learning

1 code implementation26 Jan 2022 Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan

For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build text-code pairs.

Code Search Contrastive Learning

Contextual Bandit Applications in Customer Support Bot

no code implementations6 Dec 2021 Sandra Sajeev, Jade Huang, Nikos Karampatziakis, Matthew Hall, Sebastian Kochman, Weizhu Chen

We do, however, have access to partial feedback provided by the user (clicks, surveys, and other events) which can be leveraged to improve the user experience.

Multi-Armed Bandits

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

1 code implementation NeurIPS 2021 Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes.

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

1 code implementation18 Nov 2021 Pengcheng He, Jianfeng Gao, Weizhu Chen

We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model.

Language Modelling Natural Language Understanding

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

1 code implementation30 Oct 2021 Xuxi Chen, Tianlong Chen, Yu Cheng, Weizhu Chen, Zhangyang Wang, Ahmed Hassan Awadallah

To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.

Adversarial Retriever-Ranker for dense text retrieval

1 code implementation ICLR 2022 Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen

The two models are jointly optimized according to a minimax adversarial objective: the retriever learns to retrieve negative documents to cheat the ranker, while the ranker learns to rank a collection of candidates including both the ground-truth and the retrieved ones, as well as providing progressive direct feedback to the dual-encoder retriever.

Crossformer: Transformer with Alternated Cross-Layer Guidance

no code implementations29 Sep 2021 Shujian Zhang, Zhibin Duan, Huangjie Zheng, Pengcheng He, Bo Chen, Weizhu Chen, Mingyuan Zhou

Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.

Machine Translation Node Classification +2

TAPEX: Table Pre-training via Learning a Neural SQL Executor

2 code implementations ICLR 2022 Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou

TAPEX addresses the data scarcity challenge via guiding the language model to mimic a SQL executor on the diverse, large-scale and high-quality synthetic corpus.

 Ranked #1 on Semantic Parsing on WikiSQL (Denotation accuracy (test) metric)

Language Modelling Semantic Parsing +1

LoRA: Low-Rank Adaptation of Large Language Models

1 code implementation ICLR 2022 Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation ACL 2021 Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Poolingformer: Long Document Modeling with Pooling Attention

no code implementations10 May 2021 Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen

We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA.

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

2 code implementations ACL 2022 Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, Bill Dolan

Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications.

Text Generation

Finetuning Pretrained Transformers into RNNs

1 code implementation EMNLP 2021 Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith

Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune.

Language Modelling Machine Translation +1

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations Findings (EMNLP) 2021 Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation Translation

What Makes Good In-Context Examples for GPT-$3$?

1 code implementation17 Jan 2021 Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen

Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt.

Few-Shot Learning Natural Language Understanding +2

UnitedQA: A Hybrid Approach for Open Domain Question Answering

no code implementations ACL 2021 Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.

Open-Domain Question Answering

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation1 Jan 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Open-Domain Question Answering

Few-Shot Named Entity Recognition: A Comprehensive Study

2 code implementations29 Dec 2020 Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents a comprehensive study to efficiently build named entity recognition (NER) systems when a small number of in-domain labeled data is available.

Few-Shot Learning Named Entity Recognition +1

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model

no code implementations12 Oct 2020 Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao

We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.

Language Modelling Sentence Classification

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation ACL 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Open-Domain Question Answering Passage Retrieval +1

Example-Based Named Entity Recognition

no code implementations24 Aug 2020 Morteza Ziyadi, Yuting Sun, Abhishek Goswami, Jade Huang, Weizhu Chen

We present a novel approach to named entity recognition (NER) in the presence of scarce data that we call example-based NER.

Few-Shot Learning Named Entity Recognition +2

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

5 code implementations ICLR 2021 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.

Common Sense Reasoning Coreference Resolution +9

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

no code implementations EMNLP 2020 Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen

In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text.

Entity Linking Knowledge Base Completion +4

Adversarial Training for Large Neural Language Models

2 code implementations20 Apr 2020 Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao

In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.

Ranked #3 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Natural Language Understanding

Conditional Self-Attention for Query-based Summarization

no code implementations18 Feb 2020 Yujia Xie, Tianyi Zhou, Yi Mao, Weizhu Chen

Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query.

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

4 code implementations ACL 2020 Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Linguistic Acceptability Natural Language Inference +3

X-SQL: reinforce schema representation with context

no code implementations21 Aug 2019 Pengcheng He, Yi Mao, Kaushik Chakrabarti, Weizhu Chen

In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query.

On the Variance of the Adaptive Learning Rate and Beyond

20 code implementations ICLR 2020 Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

A Hybrid Neural Network Model for Commonsense Reasoning

2 code implementations WS 2019 Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.

Common Sense Reasoning Language Modelling +2

Lessons from Contextual Bandit Learning in a Customer Support Bot

no code implementations6 May 2019 Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support.

Information Retrieval Multi-Armed Bandits +1

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

2 code implementations20 Apr 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.

Ensemble Learning Knowledge Distillation +5

Multi-Task Deep Neural Networks for Natural Language Understanding

8 code implementations ACL 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.

Domain Adaptation Language Modelling +5

Parameter-free Sentence Embedding via Orthogonal Basis

1 code implementation IJCNLP 2019 Ziyi Yang, Chenguang Zhu, Weizhu Chen

Inspired by the Gram-Schmidt Process in geometric theory, we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence.

Sentence Embedding Sentence-Embedding +1

IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

no code implementations13 Sep 2018 Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, Weizhu Chen

We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory.

Action Parsing Text-To-Sql

ReasoNet: Learning to Stop Reading in Machine Comprehension

no code implementations17 Sep 2016 Yelong Shen, Po-Sen Huang, Jianfeng Gao, Weizhu Chen

Teaching a computer to read and answer general questions pertaining to a document is a challenging yet unsolved problem.

Question Answering Reading Comprehension +1

Large-scale L-BFGS using MapReduce

no code implementations NeurIPS 2014 Weizhu Chen, Zhenghao Wang, Jingren Zhou

L-BFGS has been applied as an effective parameter estimation method for various machine learning algorithms since 1980s.

Cannot find the paper you are looking for? You can Submit a new open access paper.