Search Results for author: Pengcheng He

Found 60 papers, 41 papers with code

Paper
Code

Switchable Decision: Dynamic Neural Generation Networks

no code implementations • 7 May 2024 • Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

Auto-regressive generation models achieve competitive performance across many different NLP tasks such as summarization, question answering, and classifications.

Question Answering

Paper
Add Code

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

1 code implementation • 17 Oct 2023 • Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.

Transfer Learning

Paper
Code

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

1 code implementation • 12 Oct 2023 • Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning.

Natural Language Understanding Quantization +2

163

Paper
Code

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

no code implementations • 10 Oct 2023 • Huangjie Zheng, Zhendong Wang, Jianbo Yuan, Guanghan Ning, Pengcheng He, Quanzeng You, Hongxia Yang, Mingyuan Zhou

Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling.

Image Generation

Paper
Add Code

DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models

2 code implementations • 7 Sep 2023 • Yung-Sung Chuang, Yujia Xie, Hongyin Luo, Yoon Kim, James Glass, Pengcheng He

Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i. e., generating content that deviates from facts seen during pretraining.

349

Paper
Code

Deep Reinforcement Learning with Hierarchical Reward Modeling

1 code implementation • 6 Sep 2023 • Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao

Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Evaluating the Instruction-Following Robustness of Large Language Models to Prompt Injection

2 code implementations • 17 Aug 2023 • Zekun Li, Baolin Peng, Pengcheng He, Xifeng Yan

In this work, we establish a benchmark to evaluate the robustness of instruction-following LLMs against prompt injection attacks.

Instruction Following

Paper
Code

Summaries, Highlights, and Action items: Design, implementation and evaluation of an LLM-powered meeting recap system

no code implementations • 28 Jul 2023 • Sumit Asthana, Sagih Hilleli, Pengcheng He, Aaron Halfaker

Finally, we evaluate the effectiveness of the system with seven users in the context of their work meetings.

Paper
Add Code

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

no code implementations • 20 Jun 2023 • Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.

Model Compression Natural Language Understanding +2

Paper
Add Code

Interactive Editing for Text Summarization

1 code implementation • 5 Jun 2023 • Yujia Xie, Xun Wang, Si-Qing Chen, Wayne Xiong, Pengcheng He

Summarizing lengthy documents is a common and essential task in our daily lives.

Decoder Text Summarization

Paper
Code

Query Rewriting for Retrieval-Augmented Large Language Models

no code implementations • 23 May 2023 • Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Furthermore, to better align the query to the frozen modules, we propose a trainable scheme for our pipeline.

Language Modelling Multiple-choice +1

Paper
Add Code

PROM: A Phrase-level Copying Mechanism with Pre-training for Abstractive Summarization

1 code implementation • 11 May 2023 • Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan

Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance.

Abstractive Text Summarization

Paper
Code

Summarization with Precise Length Control

no code implementations • 9 May 2023 • Lesly Miculicich, Yujia Xie, Song Wang, Pengcheng He

Many applications of text generation such as summarization benefit from accurately controlling the text length.

Text Generation

Paper
Add Code

Personalized Abstractive Summarization by Tri-agent Generation Pipeline

1 code implementation • 4 May 2023 • Wen Xiao, Yujia Xie, Giuseppe Carenini, Pengcheng He

The inference-only large language model (ChatGPT) serves as both the generator and editor, with a smaller model acting as the instructor to guide output generation.

Abstractive Text Summarization Language Modelling +1

Paper
Code

In-Context Learning Unlocked for Diffusion Models

1 code implementation • NeurIPS 2023 • Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models.

In-Context Learning text-guided-image-editing

351

Paper
Code

POUF: Prompt-oriented unsupervised fine-tuning for large pre-trained models

1 code implementation • 29 Apr 2023 • Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou

Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years.

Image Classification Natural Language Inference +1

Paper
Code

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

1 code implementation • NeurIPS 2023 • Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou

Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.

Paper
Code

Instruction Tuning with GPT-4

2 code implementations • 6 Apr 2023 • Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao

Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed.

Instruction Following

4,000

Paper
Code

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

213

Paper
Code

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

no code implementations • 24 Feb 2023 • Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e. g., task-oriented dialog and question answering.

Informativeness Open-Domain Question Answering

Paper
Add Code

A Prototype-Oriented Clustering for Domain Shift with Source Privacy

no code implementations • 8 Feb 2023 • Korawat Tanwisuth, Shujian Zhang, Pengcheng He, Mingyuan Zhou

Finally, it refines the target model on the target domain data without guidance from the source model.

Clustering

Paper
Add Code

Attend to the Right Context: A Plug-and-Play Module for Content-Controllable Summarization

1 code implementation • 21 Dec 2022 • Wen Xiao, Lesly Miculicich, Yang Liu, Pengcheng He, Giuseppe Carenini

Content-Controllable Summarization generates summaries focused on the given controlling signals.

Paper
Code

DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization

no code implementations • 20 Dec 2022 • Yu Li, Baolin Peng, Pengcheng He, Michel Galley, Zhou Yu, Jianfeng Gao

In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain.

Decoder

Paper
Add Code

Momentum Calibration for Text Generation

no code implementations • 8 Dec 2022 • Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei

The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.

Ranked #2 on Text Summarization on SAMSum

Abstractive Text Summarization Text Generation

Paper
Add Code

HyperTuning: Toward Adapting Large Language Models without Back-propagation

no code implementations • 22 Nov 2022 • Jason Phang, Yi Mao, Pengcheng He, Weizhu Chen

Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization.

Language Modelling

Paper
Add Code

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modelling Model Compression

Paper
Code

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

1 code implementation • 21 Aug 2022 • Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang

Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages.

Abstractive Text Summarization Decoder +2

1,863

Paper
Code

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

1 code implementation • NAACL 2022 • Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, Weizhu Chen

The information in tables can be an important complement to text, making table-based question answering (QA) systems of great value.

Ranked #7 on Semantic Parsing on WikiTableQuestions

Question Answering Retrieval +1

Paper
Code

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao

Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.

Image Classification Natural Language Understanding +1

Paper
Code

GODEL: Large-Scale Pre-Training for Goal-Directed Dialog

1 code implementation • 22 Jun 2022 • Baolin Peng, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, Jianfeng Gao

We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-trained language model for dialog.

Language Modelling Open-Domain Dialog

836

Paper
Code

Diffusion-GAN: Training GANs with Diffusion

3 code implementations • 5 Jun 2022 • Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Both the observed and generated data are diffused by the same adaptive diffusion process.

Ranked #1 on Image Generation on LSUN Bedroom 256 x 256

Image Generation

546

Paper
Code

ALLSH: Active Learning Guided by Local Sensitivity and Hardness

no code implementations • Findings (NAACL) 2022 • Shujian Zhang, Chengyue Gong, Xingchao Liu, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Active learning, which effectively collects informative unlabeled data for annotation, reduces the demand for labeled data.

Active Learning Few-Shot Learning

Paper
Add Code

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen

We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.

Knowledge Distillation Natural Language Understanding +1

Paper
Code

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Paper
Code

Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders

1 code implementation • 19 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.

Ranked #1 on Text-to-Image Generation on CUB

Text-to-Image Generation

Paper
Code

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

2 code implementations • 14 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.

162

Paper
Code

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.

Image Classification Machine Translation +2

Paper
Code

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang

In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.

Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning

106

Paper
Code

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

2 code implementations • 18 Nov 2021 • Pengcheng He, Jianfeng Gao, Weizhu Chen

We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model.

Ranked #1 on Natural Language Inference on MRPC

Natural Language Inference Natural Language Understanding +2

1,863

Paper
Code

Crossformer: Transformer with Alternated Cross-Layer Guidance

no code implementations • 29 Sep 2021 • Shujian Zhang, Zhibin Duan, Huangjie Zheng, Pengcheng He, Bo Chen, Weizhu Chen, Mingyuan Zhou

Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.

Inductive Bias Machine Translation +3

Paper
Add Code

ARCH: Efficient Adversarial Regularized Training with Caching

1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Adversarial regularization can improve model generalization in many natural language processing tasks.

Machine Translation Natural Language Understanding +1

Paper
Code

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Paper
Code

Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach

1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.

Machine Translation Natural Language Understanding +1

Paper
Code

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation NMT +2

Paper
Add Code

Greedy-Step Off-Policy Reinforcement Learning

no code implementations • 23 Feb 2021 • Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan

Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).

Q-Learning reinforcement-learning +1

Paper
Add Code

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih

We review the EfficientQA competition from NeurIPS 2020.

Open-Domain Question Answering Retrieval

Paper
Add Code

UnitedQA: A Hybrid Approach for Open Domain Question Answering

no code implementations • ACL 2021 • Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.

Ranked #1 on Open-Domain Question Answering on TriviaQA

Open-Domain Question Answering Retrieval +1

Paper
Add Code

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Natural Questions Open-Domain Question Answering +2

Paper
Code

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Ranked #9 on Passage Retrieval on Natural Questions

Natural Questions Open-Domain Question Answering +4

Paper
Code

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

9 code implementations • ICLR 2021 • Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.

Ranked #1 on Common Sense Reasoning on SWAG

Common Sense Reasoning Coreference Resolution +10

126,108

Paper
Code

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

no code implementations • EMNLP 2020 • Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen

In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text.

Entity Linking Knowledge Base Completion +5

Paper
Add Code

Adversarial Training for Large Neural Language Models

3 code implementations • 20 Apr 2020 • Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao

In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.

Ranked #6 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Natural Language Understanding

2,208

Paper
Code

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

3 code implementations • ACL 2020 • Xiaodong Liu, Yu Wang, Jianshu ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao

We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models.

Knowledge Distillation Multi-Task Learning +2

2,208

Paper
Code

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

6 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Ranked #1 on Natural Language Inference on QNLI

Linguistic Acceptability Natural Language Inference +4

2,208

Paper
Code

X-SQL: reinforce schema representation with context

no code implementations • 21 Aug 2019 • Pengcheng He, Yi Mao, Kaushik Chakrabarti, Weizhu Chen

In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query.

Paper
Add Code

On the Variance of the Adaptive Learning Rate and Beyond

21 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

48,924

Paper
Code

A Hybrid Neural Network Model for Commonsense Reasoning

3 code implementations • WS 2019 • Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.

Ranked #1 on Natural Language Understanding on PDP60

Common Sense Reasoning Coreference Resolution +6

2,208

Paper
Code

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

3 code implementations • 20 Apr 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.

Ranked #1 on Semantic Textual Similarity on SentEval

Ensemble Learning Knowledge Distillation +5

2,208

Paper
Code

Multi-Task Deep Neural Networks for Natural Language Understanding

7 code implementations • ACL 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.

Ranked #2 on Natural Language Inference on SciTail

Domain Adaptation Language Modelling +5

2,208

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.