Search Results for author: Weizhu Chen

Found 117 papers, 74 papers with code

What Makes Good In-Context Examples for GPT-3?

no code implementations • DeeLIO (ACL) 2022 • Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen

In this work, we investigate whether there are more effective strategies for judiciously selecting in-context examples (relative to random sampling) that better leverage GPT-3’s in-context learning capabilities. Inspired by the recent success of leveraging a retrieval module to augment neural networks, we propose to retrieve examples that are semantically-similar to a test query sample to formulate its corresponding prompt.

In-Context Learning Natural Language Understanding +4

Paper
Add Code

Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation • Findings (ACL) 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Open-Domain Question Answering

Paper
Code

Finding the Dominant Winning Ticket in Pre-Trained Language Models

no code implementations • Findings (ACL) 2022 • Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan

Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.

Paper
Add Code

Few-Shot Named Entity Recognition: An Empirical Baseline Study

no code implementations • EMNLP 2021 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents an empirical study to efficiently build named entity recognition (NER) systems when a small amount of in-domain labeled data is available.

Few-Shot Learning named-entity-recognition +2

Paper
Add Code

Rho-1: Not All Tokens Are What You Need

2 code implementations • 11 Apr 2024 • Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen

After fine-tuning, Rho-1-1B and 7B achieved state-of-the-art results of 40. 6% and 51. 8% on MATH dataset, respectively - matching DeepSeekMath with only 3% of the pretraining tokens.

Continual Pretraining Language Modelling +1

198

Paper
Code

A Note on LoRA

no code implementations • 7 Apr 2024 • Vlad Fomenko, Han Yu, Jongho Lee, Stanley Hsieh, Weizhu Chen

LoRA (Low-Rank Adaptation) has emerged as a preferred method for efficiently adapting Large Language Models (LLMs) with remarkable simplicity and efficacy.

Paper
Add Code

Exploring the Mystery of Influential Data for Mathematical Reasoning

no code implementations • 1 Apr 2024 • Xinzhe Ni, Yeyun Gong, Zhibin Gou, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen

Additionally, we showcase the use of QaDS in creating efficient fine-tuning mixtures with various selection ratios, and analyze the quality of a wide range of open-source datasets, which can perform as a reference for future works on mathematical reasoning tasks.

Math Mathematical Reasoning

Paper
Add Code

Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning

1 code implementation • 4 Mar 2024 • Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, Weizhu Chen

Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets.

Ranked #49 on Math Word Problem Solving on MATH (using extra training data)

GSM8K Math +1

16,643

Paper
Code

Multi-LoRA Composition for Image Generation

no code implementations • 26 Feb 2024 • Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen

Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images.

Denoising Image Generation

Paper
Add Code

SciAgent: Tool-augmented Language Models for Scientific Reasoning

no code implementations • 18 Feb 2024 • Yubo Ma, Zhibin Gou, Junheng Hao, Ruochen Xu, Shuohang Wang, Liangming Pan, Yujiu Yang, Yixin Cao, Aixin Sun, Hany Awadalla, Weizhu Chen

To make this task more practical and solvable for LLMs, we introduce a new task setting named tool-augmented scientific reasoning.

Paper
Add Code

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts

1 code implementation • 12 Feb 2024 • Yueqin Yin, Zhendong Wang, Yi Gu, Hai Huang, Weizhu Chen, Mingyuan Zhou

In the field of large language models (LLMs), aligning models with the diverse preferences of users is a critical challenge.

Paper
Code

Supervised Knowledge Makes Large Language Models Better In-context Learners

1 code implementation • 26 Dec 2023 • Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen, Yue Zhang

Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering.

In-Context Learning Natural Language Understanding +2

Paper
Code

Competition-Level Problems are Effective LLM Evaluators

no code implementations • 4 Dec 2023 • Yiming Huang, Zhenghao Lin, Xiao Liu, Yeyun Gong, Shuai Lu, Fangyu Lei, Yaobo Liang, Yelong Shen, Chen Lin, Nan Duan, Weizhu Chen

Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently.

Paper
Add Code

Language Models can be Logical Solvers

no code implementations • 10 Nov 2023 • Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen

Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions.

Decision Making Language Modelling +1

Paper
Add Code

Learning From Mistakes Makes LLM Better Reasoner

1 code implementation • 31 Oct 2023 • Shengnan An, Zexiong Ma, Zeqi Lin, Nanning Zheng, Jian-Guang Lou, Weizhu Chen

To further improve their reasoning capabilities, this work explores whether LLMs can LEarn from MistAkes (LEMA), akin to the human learning process.

GSM8K Math +1

Paper
Code

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective

no code implementations • 17 Oct 2023 • Ming Zhong, Chenxin An, Weizhu Chen, Jiawei Han, Pengcheng He

In this paper, we seek to empirically investigate knowledge transfer from larger to smaller models through a parametric perspective.

Transfer Learning

Paper
Add Code

LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models

1 code implementation • 12 Oct 2023 • Yixiao Li, Yifan Yu, Chen Liang, Pengcheng He, Nikos Karampatziakis, Weizhu Chen, Tuo Zhao

Quantization is an indispensable technique for serving Large Language Models (LLMs) and has recently found its way into LoRA fine-tuning.

Natural Language Understanding Quantization +2

152

Paper
Code

Sparse Backpropagation for MoE Training

no code implementations • 1 Oct 2023 • Liyuan Liu, Jianfeng Gao, Weizhu Chen

One defining characteristic of Mixture-of-Expert (MoE) models is their capacity for conducting sparse computation via expert routing, leading to remarkable scalability.

Machine Translation

Paper
Add Code

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency

no code implementations • 29 Sep 2023 • Baizhou Huang, Shuai Lu, Weizhu Chen, Xiaojun Wan, Nan Duan

We propose the Multi-Perspective Self-Consistency (MPSC) framework incorporating both inter- and intra-consistency across outputs from multiple perspectives.

Code Generation

Paper
Add Code

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

1 code implementation • 29 Sep 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen

Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.

Ranked #10 on Math Word Problem Solving on MATH (using extra training data)

Arithmetic Reasoning Computational Efficiency +3

814

Paper
Code

Deep Reinforcement Learning with Hierarchical Reward Modeling

1 code implementation • 6 Sep 2023 • Alexander Bukharin, Yixiao Li, Pengcheng He, Weizhu Chen, Tuo Zhao

Researchers typically utilize feedback signals from the environment to handcraft a reward function, but this process is not always effective due to the varying scale and intricate dependencies of the feedback signals.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation

no code implementations • 20 Jun 2023 • Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

Pruning enhances the diversity of low-rank approximations, and low-rank approximation prevents pruning from losing too many expressive neurons.

Model Compression Natural Language Understanding +2

Paper
Add Code

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions

no code implementations • 24 May 2023 • Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren

Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.

Object Question Answering +2

Paper
Add Code

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy

no code implementations • 24 May 2023 • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen

In this paper, we show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.

Fact Verification Multi-hop Question Answering +2

Paper
Add Code

Skill-Based Few-Shot Selection for In-Context Learning

no code implementations • 23 May 2023 • Shengnan An, Bo Zhou, Zeqi Lin, Qiang Fu, Bei Chen, Nanning Zheng, Weizhu Chen, Jian-Guang Lou

Few-shot selection -- selecting appropriate examples for each test instance separately -- is important for in-context learning.

In-Context Learning Semantic Parsing +1

Paper
Add Code

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing

1 code implementation • 19 May 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen

Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging.

Fact Checking Natural Questions +4

614

Paper
Code

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation

1 code implementation • NeurIPS 2023 • Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen

Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance.

Common Sense Reasoning Denoising +4

614

Paper
Code

Code Execution with Pre-trained Language Models

1 code implementation • 8 May 2023 • Chenxiao Liu, Shuai Lu, Weizhu Chen, Daxin Jiang, Alexey Svyatkovskiy, Shengyu Fu, Neel Sundaresan, Nan Duan

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code.

Code Generation Code Search +2

1,973

Paper
Code

In-Context Learning Unlocked for Diffusion Models

1 code implementation • NeurIPS 2023 • Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou

We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models.

In-Context Learning text-guided-image-editing

349

Paper
Code

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models

1 code implementation • NeurIPS 2023 • Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou

Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.

Paper
Code

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models

2 code implementations • 13 Apr 2023 • Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, Nan Duan

Impressively, GPT-4 surpasses average human performance on SAT, LSAT, and math competitions, attaining a 95% accuracy rate on the SAT Math test and a 92. 5% accuracy on the English test of the Chinese national college entrance exam.

Decision Making Math

631

Paper
Code

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators

2 code implementations • 29 Mar 2023 • Xingwei He, Zhenghao Lin, Yeyun Gong, A-Long Jin, Hang Zhang, Chen Lin, Jian Jiao, Siu Ming Yiu, Nan Duan, Weizhu Chen

Many natural language processing (NLP) tasks rely on labeled data to train machine learning models with high performance.

Information Retrieval Retrieval

1,778

Paper
Code

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation

1 code implementation • 22 Mar 2023 • Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian-Guang Lou, Weizhu Chen

The task of repository-level code completion is to continue writing the unfinished code based on a broader context of the repository.

Code Completion Language Modelling +1

547

Paper
Code

AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning

2 code implementations • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Nikos Karampatziakis, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao

Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.

Question Answering Text Generation

203

Paper
Code

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback

no code implementations • 24 Feb 2023 • Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao

Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e. g., task-oriented dialog and question answering.

Informativeness Open-Domain Question Answering

Paper
Add Code

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models

no code implementations • 1 Feb 2023 • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen

However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly.

Paper
Add Code

Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise

1 code implementation • 22 Dec 2022 • Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, Weizhu Chen

In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE.

Denoising Language Modelling +1

614

Paper
Code

Generation-Augmented Query Expansion For Code Retrieval

no code implementations • 20 Dec 2022 • Dong Li, Yelong Shen, Ruoming Jin, Yi Mao, Kuan Wang, Weizhu Chen

Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet.

Code Generation Retrieval

Paper
Add Code

HyperTuning: Toward Adapting Large Language Models without Back-propagation

no code implementations • 22 Nov 2022 • Jason Phang, Yi Mao, Pengcheng He, Weizhu Chen

Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization.

Language Modelling

Paper
Add Code

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation

2 code implementations • 18 Nov 2022 • Biyang Guo, Yeyun Gong, Yelong Shen, Songqiao Han, Hailiang Huang, Nan Duan, Weizhu Chen

We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens).

Conditional Text Generation Data Augmentation +8

175

Paper
Code

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval

1 code implementation • 21 Oct 2022 • Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen

Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.

Retrieval Text Retrieval

Paper
Code

Soft-Labeled Contrastive Pre-training for Function-level Code Representation

1 code implementation • 18 Oct 2022 • Xiaonan Li, Daya Guo, Yeyun Gong, Yun Lin, Yelong Shen, Xipeng Qiu, Daxin Jiang, Weizhu Chen, Nan Duan

In this paper, we present \textbf{SCodeR}, a \textbf{S}oft-labeled contrastive pre-training framework with two positive sample construction methods to learn functional-level \textbf{Code} \textbf{R}epresentation.

Paper
Code

Less is More: Task-aware Layer-wise Distillation for Language Model Compression

1 code implementation • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao

As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.

Language Modelling Model Compression

Paper
Code

CodeT: Code Generation with Generated Tests

1 code implementation • 21 Jul 2022 • Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen

A natural way to evaluate the quality and correctness of a code solution is to run it against a set of test cases, but the manual creation of such test cases is often costly and time-consuming.

Ranked #1 on Code Generation on APPS (Introductory Pass@1 metric)

Code Generation

547

Paper
Code

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

1 code implementation • NAACL 2022 • Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, Weizhu Chen

The information in tables can be an important complement to text, making table-based question answering (QA) systems of great value.

Ranked #7 on Semantic Parsing on WikiTableQuestions

Question Answering Retrieval +1

Paper
Code

Joint Generator-Ranker Learning for Natural Language Generation

2 code implementations • 28 Jun 2022 • Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen

Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates.

Question Generation Question-Generation +2

614

Paper
Code

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao

Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.

Image Classification Natural Language Understanding +1

Paper
Code

CERT: Continual Pre-Training on Sketches for Library-Oriented Code Generation

1 code implementation • 14 Jun 2022 • Daoguang Zan, Bei Chen, Dejian Yang, Zeqi Lin, Minsu Kim, Bei guan, Yongji Wang, Weizhu Chen, Jian-Guang Lou

Usually, expensive text-code paired data is essential for training a code generation model.

Ranked #121 on Code Generation on HumanEval

Library-Oriented Code Generation

236

Paper
Code

Making Large Language Models Better Reasoners with Step-Aware Verifier

no code implementations • 6 Jun 2022 • Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, Weizhu Chen

Few-shot learning is a challenging task that requires language models to generalize from limited examples.

Ranked #49 on Arithmetic Reasoning on GSM8K

Arithmetic Reasoning Few-Shot Learning +2

Paper
Add Code

Diffusion-GAN: Training GANs with Diffusion

3 code implementations • 5 Jun 2022 • Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Both the observed and generated data are diffused by the same adaptive diffusion process.

Ranked #1 on Image Generation on LSUN Bedroom 256 x 256

Image Generation

536

Paper
Code

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation

no code implementations • 23 May 2022 • Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan

To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.

Question Generation Question-Generation +1

Paper
Add Code

ALLSH: Active Learning Guided by Local Sensitivity and Hardness

no code implementations • Findings (NAACL) 2022 • Shujian Zhang, Chengyue Gong, Xingchao Liu, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Active learning, which effectively collects informative unlabeled data for annotation, reduces the demand for labeled data.

Active Learning Few-Shot Learning

Paper
Add Code

DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation

1 code implementation • ACL 2022 • Wei Chen, Yeyun Gong, Song Wang, Bolun Yao, Weizhen Qi, Zhongyu Wei, Xiaowu Hu, Bartuer Zhou, Yi Mao, Weizhu Chen, Biao Cheng, Nan Duan

Dialog response generation in open domain is an important research topic where the main challenge is to generate relevant and diverse responses.

Language Modelling Response Generation

Paper
Code

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen

We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.

Knowledge Distillation Natural Language Understanding +1

Paper
Code

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.

Ensemble Learning

Paper
Code

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

3 code implementations • 7 Mar 2022 • Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

1,165

Paper
Code

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models

no code implementations • 7 Mar 2022 • Shengnan An, Yifei Li, Zeqi Lin, Qian Liu, Bei Chen, Qiang Fu, Weizhu Chen, Nanning Zheng, Jian-Guang Lou

This motivates us to propose input-tuning, which fine-tunes both the continuous prompts and the input representations, leading to a more effective way to adapt unfamiliar inputs to frozen PLMs.

Language Modelling Natural Language Understanding +1

Paper
Add Code

Controllable Natural Language Generation with Contrastive Prefixes

no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen

We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.

Attribute Language Modelling +1

Paper
Add Code

Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders

1 code implementation • 19 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.

Ranked #1 on Text-to-Image Generation on CUB

Text-to-Image Generation

Paper
Code

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs

2 code implementations • 14 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou

In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.

160

Paper
Code

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.

Image Classification Machine Translation +2

Paper
Code

Reasoning Like Program Executors

1 code implementation • 27 Jan 2022 • Xinyu Pi, Qian Liu, Bei Chen, Morteza Ziyadi, Zeqi Lin, Qiang Fu, Yan Gao, Jian-Guang Lou, Weizhu Chen

Reasoning over natural language is a long-standing goal for the research community.

Ranked #2 on Question Answering on DROP Test (using extra training data)

Logical Reasoning Math +1

359

Paper
Code

CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search

1 code implementation • 26 Jan 2022 • Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan

For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build code-text pairs.

Code Search Contrastive Learning

Paper
Code

Contextual Bandit Applications in Customer Support Bot

no code implementations • 6 Dec 2021 • Sandra Sajeev, Jade Huang, Nikos Karampatziakis, Matthew Hall, Sebastian Kochman, Weizhu Chen

We do, however, have access to partial feedback provided by the user (clicks, surveys, and other events) which can be leveraged to improve the user experience.

Multi-Armed Bandits

Paper
Add Code

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

1 code implementation • NeurIPS 2021 • Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes.

1,165

Paper
Code

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

2 code implementations • 18 Nov 2021 • Pengcheng He, Jianfeng Gao, Weizhu Chen

We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model.

Ranked #1 on Natural Language Inference on MRPC

Natural Language Inference Natural Language Understanding +2

1,844

Paper
Code

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models

1 code implementation • 30 Oct 2021 • Xuxi Chen, Tianlong Chen, Weizhu Chen, Ahmed Hassan Awadallah, Zhangyang Wang, Yu Cheng

To address these pain points, we propose a framework for resource- and parameter-efficient fine-tuning by leveraging the sparsity prior in both weight updates and the final model weights.

Paper
Code

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models

1 code implementation • ACL 2022 • Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren

Large pre-trained vision-language (VL) models can learn a new task with a handful of examples and generalize to a new task without fine-tuning.

Ranked #4 on Image Captioning on Flickr30k Captions test (SPICE metric)

Image Captioning Language Modelling +2

Paper
Code

Adversarial Retriever-Ranker for dense text retrieval

1 code implementation • ICLR 2022 • Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen

To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker.

Natural Questions Retrieval +2

Paper
Code

Crossformer: Transformer with Alternated Cross-Layer Guidance

no code implementations • 29 Sep 2021 • Shujian Zhang, Zhibin Duan, Huangjie Zheng, Pengcheng He, Bo Chen, Weizhu Chen, Mingyuan Zhou

Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.

Inductive Bias Machine Translation +3

Paper
Add Code

XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge

1 code implementation • 26 Sep 2021 • Xiaoze Jiang, Yaobo Liang, Weizhu Chen, Nan Duan

The results on MLQA and NER exhibit the superiority of XLM-K in knowledge related tasks.

Language Modelling NER

Paper
Code

ARCH: Efficient Adversarial Regularized Training with Caching

1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Adversarial regularization can improve model generalization in many natural language processing tasks.

Machine Translation Natural Language Understanding +1

Paper
Code

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalizability

1 code implementation • ACL 2021 • Jiaao Chen, Dinghan Shen, Weizhu Chen, Diyi Yang

Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP.

Data Augmentation Natural Language Understanding

Paper
Code

TAPEX: Table Pre-training via Learning a Neural SQL Executor

1 code implementation • ICLR 2022 • Qian Liu, Bei Chen, Jiaqi Guo, Morteza Ziyadi, Zeqi Lin, Weizhu Chen, Jian-Guang Lou

TAPEX addresses the data scarcity challenge via guiding the language model to mimic a SQL executor on the diverse, large-scale and high-quality synthetic corpus.

Ranked #1 on Semantic Parsing on WikiSQL (Denotation accuracy (test) metric)

Language Modelling Semantic Parsing +1

273

Paper
Code

LoRA: Low-Rank Adaptation of Large Language Models

48 code implementations • ICLR 2022 • Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen

We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.

Language Modelling

28,742

Paper
Code

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization

1 code implementation • 31 May 2021 • Jiaao Chen, Dinghan Shen, Weizhu Chen, Diyi Yang

Fine-tuning large pre-trained models with task-specific data has achieved great success in NLP.

Data Augmentation Natural Language Understanding

Paper
Code

Memory-Efficient Differentiable Transformer Architecture Search

no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen

To this end, we propose a multi-split reversible network and combine it with DARTS.

Paper
Add Code

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Paper
Code

Poolingformer: Long Document Modeling with Pooling Attention

no code implementations • 10 May 2021 • Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen

We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA.

Paper
Add Code

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation

2 code implementations • ACL 2022 • Tianyu Liu, Yizhe Zhang, Chris Brockett, Yi Mao, Zhifang Sui, Weizhu Chen, Bill Dolan

Large pretrained generative models like GPT-3 often suffer from hallucinating non-existent or incorrect content, which undermines their potential merits in real applications.

Hallucination Sentence +1

Paper
Code

Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach

1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.

Machine Translation Natural Language Understanding +1

Paper
Code

Finetuning Pretrained Transformers into RNNs

1 code implementation • EMNLP 2021 • Jungo Kasai, Hao Peng, Yizhe Zhang, Dani Yogatama, Gabriel Ilharco, Nikolaos Pappas, Yi Mao, Weizhu Chen, Noah A. Smith

Specifically, we propose a swap-then-finetune procedure: in an off-the-shelf pretrained transformer, we replace the softmax attention with its linear-complexity recurrent alternative and then finetune.

Ranked #2 on Machine Translation on WMT2017 Chinese-English

Language Modelling Machine Translation +1

Paper
Code

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation NMT +2

Paper
Add Code

What Makes Good In-Context Examples for GPT-$3$?

3 code implementations • 17 Jan 2021 • Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, Weizhu Chen

Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt.

Few-Shot Learning Natural Language Understanding +4

10,165

Paper
Code

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih

We review the EfficientQA competition from NeurIPS 2020.

Open-Domain Question Answering Retrieval

Paper
Add Code

UnitedQA: A Hybrid Approach for Open Domain Question Answering

no code implementations • ACL 2021 • Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.

Ranked #1 on Open-Domain Question Answering on TriviaQA

Open-Domain Question Answering Retrieval +1

Paper
Add Code

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Natural Questions Open-Domain Question Answering +2

Paper
Code

BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining

1 code implementation • 31 Dec 2020 • Weizhen Qi, Yeyun Gong, Jian Jiao, Yu Yan, Weizhu Chen, Dayiheng Liu, Kewen Tang, Houqiang Li, Jiusheng Chen, Ruofei Zhang, Ming Zhou, Nan Duan

In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation.

Dialogue Generation Question Generation +1

Paper
Code

Few-Shot Named Entity Recognition: A Comprehensive Study

2 code implementations • 29 Dec 2020 • Jiaxin Huang, Chunyuan Li, Krishan Subudhi, Damien Jose, Shobana Balakrishnan, Weizhu Chen, Baolin Peng, Jianfeng Gao, Jiawei Han

This paper presents a comprehensive study to efficiently build named entity recognition (NER) systems when a small number of in-domain labeled data is available.

Few-Shot Learning named-entity-recognition +2

Paper
Code

GLGE: A New General Language Generation Evaluation Benchmark

1 code implementation • Findings (ACL) 2021 • Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan

Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP).

Natural Language Understanding Text Generation +1

Paper
Code

MixKD: Towards Efficient Distillation of Large-scale Language Models

no code implementations • ICLR 2021 • Kevin J Liang, Weituo Hao, Dinghan Shen, Yufan Zhou, Weizhu Chen, Changyou Chen, Lawrence Carin

Large-scale language models have recently demonstrated impressive empirical performance.

Data Augmentation Knowledge Distillation

Paper
Add Code

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

no code implementations • ICLR 2021 • Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen

To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks.

Data Augmentation Natural Language Understanding

Paper
Add Code

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model

no code implementations • 12 Oct 2020 • Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao

We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.

Ranked #1 on Sentence Classification on ACL-ARC

Language Modelling Sentence Classification

Paper
Add Code

A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation

2 code implementations • 29 Sep 2020 • Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen

Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability.

Ranked #8 on Machine Translation on IWSLT2014 German-English

Data Augmentation Machine Translation +3

Paper
Code

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Ranked #9 on Passage Retrieval on Natural Questions

Natural Questions Open-Domain Question Answering +4

Paper
Code

Example-Based Named Entity Recognition

1 code implementation • 24 Aug 2020 • Morteza Ziyadi, Yuting Sun, Abhishek Goswami, Jade Huang, Weizhu Chen

We present a novel approach to named entity recognition (NER) in the presence of scarce data that we call example-based NER.

Few-Shot Learning named-entity-recognition +3

119

Paper
Code

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

9 code implementations • ICLR 2021 • Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.

Ranked #1 on Common Sense Reasoning on SWAG

Common Sense Reasoning Coreference Resolution +10

124,527

Paper
Code

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning

no code implementations • EMNLP 2020 • Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen

In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text.

Entity Linking Knowledge Base Completion +5

Paper
Add Code

Adversarial Training for Large Neural Language Models

3 code implementations • 20 Apr 2020 • Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao

In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.

Ranked #6 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Natural Language Understanding

2,198

Paper
Code

Understanding the Difficulty of Training Transformers

2 code implementations • EMNLP 2020 • Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han

Transformers have proved effective in many NLP tasks.

Ranked #5 on Machine Translation on WMT2014 English-French

Machine Translation

322

Paper
Code

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding

3 code implementations • ACL 2020 • Xiaodong Liu, Yu Wang, Jianshu ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao

We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models.

Knowledge Distillation Multi-Task Learning +2

2,198

Paper
Code

Conditional Self-Attention for Query-based Summarization

no code implementations • 18 Feb 2020 • Yujia Xie, Tianyi Zhou, Yi Mao, Weizhu Chen

Thereby, the contextual dependencies modeled by CSA will be highly relevant to the query.

Paper
Add Code

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

6 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Ranked #1 on Natural Language Inference on QNLI

Linguistic Acceptability Natural Language Inference +4

2,198

Paper
Code

X-SQL: reinforce schema representation with context

no code implementations • 21 Aug 2019 • Pengcheng He, Yi Mao, Kaushik Chakrabarti, Weizhu Chen

In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query.

Paper
Add Code

On the Variance of the Adaptive Learning Rate and Beyond

21 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

47,594

Paper
Code

A Hybrid Neural Network Model for Commonsense Reasoning

3 code implementations • WS 2019 • Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.

Ranked #1 on Natural Language Understanding on PDP60

Common Sense Reasoning Coreference Resolution +6

2,198

Paper
Code

Lessons from Contextual Bandit Learning in a Customer Support Bot

no code implementations • 6 May 2019 • Nikos Karampatziakis, Sebastian Kochman, Jade Huang, Paul Mineiro, Kathy Osborne, Weizhu Chen

In this work, we describe practical lessons we have learned from successfully using contextual bandits (CBs) to improve key business metrics of the Microsoft Virtual Agent for customer support.

Information Retrieval Multi-Armed Bandits +2

Paper
Add Code

Zero-training Sentence Embedding via Orthogonal Basis

1 code implementation • ICLR 2019 • Ziyi Yang, Chenguang Zhu, Weizhu Chen

We model the semantic meaning of a word in a sentence based on two aspects.

Sentence Sentence Embedding +2

Paper
Code

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

3 code implementations • 20 Apr 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.

Ranked #1 on Semantic Textual Similarity on SentEval

Ensemble Learning Knowledge Distillation +5

2,198

Paper
Code

Multi-Task Deep Neural Networks for Natural Language Understanding

7 code implementations • ACL 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.

Ranked #2 on Natural Language Inference on SciTail

Domain Adaptation Language Modelling +5

2,198

Paper
Code

Parameter-free Sentence Embedding via Orthogonal Basis

1 code implementation • IJCNLP 2019 • Ziyi Yang, Chenguang Zhu, Weizhu Chen

Inspired by the Gram-Schmidt Process in geometric theory, we build an orthogonal basis of the subspace spanned by a word and its surrounding context in a sentence.

Sentence Sentence Embedding +2

Paper
Code

IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles

no code implementations • 13 Sep 2018 • Tianze Shi, Kedar Tatwawadi, Kaushik Chakrabarti, Yi Mao, Oleksandr Polozov, Weizhu Chen

We present a sequence-to-action parsing approach for the natural language to SQL task that incrementally fills the slots of a SQL query with feasible actions from a pre-defined inventory.

Action Parsing Text-To-SQL