Search Results for author: Xiaodong Liu

Found 109 papers, 64 papers with code

Pseudo-Masked Language Models for Unified Language Model Pre-Training

1 code implementation ICML 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Decoder Language Modelling +2

BayLing 2: A Multilingual Large Language Model with Efficient Language Alignment

1 code implementation25 Nov 2024 Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng

In this paper, we introduce BayLing 2, which efficiently transfers generative capabilities and knowledge from high-resource languages to low-resource languages through language alignment.

Language Modelling Large Language Model +1

Adaptively Controllable Diffusion Model for Efficient Conditional Image Generation

no code implementations19 Nov 2024 Yucheng Xing, Xiaodong Liu, Xin Wang

In this work, we propose a new adaptive framework, $\textit{Adaptively Controllable Diffusion (AC-Diff) Model}$, to automatically and fully control the generation process, including not only the type of generation result but also the length and parameters of the generation process.

Conditional Image Generation

Map-Free Trajectory Prediction with Map Distillation and Hierarchical Encoding

no code implementations17 Nov 2024 Xiaodong Liu, Yucheng Xing, Xin Wang

Reliable motion forecasting of surrounding agents is essential for ensuring the safe operation of autonomous vehicles.

Autonomous Vehicles Decoder +3

StreamAdapter: Efficient Test Time Adaptation from Contextual Streams

no code implementations14 Nov 2024 Dilxat Muhtar, Yelong Shen, Yaming Yang, Xiaodong Liu, Yadong Lu, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Xueliang Zhang, Jianfeng Gao, Weizhu Chen, Qi Zhang

The superior task adaptation and context encoding capabilities of StreamAdapter on both language understanding and generation tasks provides a new perspective for adapting LLMs at test time using context, allowing for more efficient adaptation across scenarios and more cost-effective inference

In-Context Learning Test-time Adaptation

Model Editing for LLMs4Code: How Far are We?

1 code implementation11 Nov 2024 Xiaopeng Li, Shangwen Wang, Shasha Li, Jun Ma, Jie Yu, Xiaodong Liu, Jing Wang, Bin Ji, Weimin Zhang

Despite that, a comprehensive study that thoroughly compares and analyzes the performance of the state-of-the-art model editing techniques for adapting the knowledge within LLMs4Code across various code-related tasks is notably absent.

16k Code Generation +5

Generative Adapter: Contextualizing Language Models in Parameters with A Single Forward Pass

no code implementations8 Nov 2024 Tong Chen, Hao Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, Hao Cheng

In StreamingQA, our approach is effective in injecting knowledge into the LM's parameters, achieving a 63. 5% improvement in F1 score over the model with supervised fine-tuning (from $19. 5$ to $31. 5$) for contexts as long as 32K tokens.

In-Context Learning Self-Supervised Learning

Iterative Self-Tuning LLMs for Enhanced Jailbreaking Capabilities

1 code implementation24 Oct 2024 Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao

Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses.

Safety Alignment

GRIN: GRadient-INformed MoE

no code implementations18 Sep 2024 Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen

Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules.

HellaSwag HumanEval +3

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering

no code implementations16 Sep 2024 Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng

However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.

Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

no code implementations12 Jul 2024 Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost.

DefSent+: Improving sentence embeddings of language models by projecting definition sentences into a quasi-isotropic or isotropic vector space of unlimited dictionary entries

1 code implementation25 May 2024 Xiaodong Liu

We discover that this approach is not fully explored due to the methodological limitation of using word embeddings of language models to represent dictionary entries.

Sentence Sentence Embeddings +1

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

4 code implementations7 May 2024 DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, JianZhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie

MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.

Language Modelling Reinforcement Learning (RL)

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

no code implementations22 Apr 2024 Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou

We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.

Ranked #5 on MMR total on MRR-Benchmark (using extra training data)

Language Modelling Math +2

Energy Efficiency Optimization Method of WDM Visible Light Communication System for Indoor Broadcasting Networks

no code implementations25 Mar 2024 Dayu Shi, Xun Zhang, Ziqi Liu, Xuanbang Chen, Jianghao Li, Xiaodong Liu, William Shieh

This paper introduces a novel approach to optimize energy efficiency in wavelength division multiplexing (WDM) Visible Light Communication (VLC) systems designed for indoor broadcasting networks.

SWEA: Updating Factual Knowledge in Large Language Models via Subject Word Embedding Altering

1 code implementation31 Jan 2024 Xiaopeng Li, Shasha Li, Shezheng Song, Huijun Liu, Bin Ji, Xi Wang, Jun Ma, Jie Yu, Xiaodong Liu, Jing Wang, Weimin Zhang

In particular, local editing methods, which directly update model parameters, are more suitable for updating a small amount of knowledge.

Model Editing Word Embeddings

Towards Consistent Natural-Language Explanations via Explanation-Consistency Finetuning

1 code implementation25 Jan 2024 Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao

We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples.

Question Answering

Feasibility Conditions for Mobile LiFi

no code implementations20 Dec 2023 Shuai Ma, Haihong Sheng, Junchang Sun, Hang Li, Xiaodong Liu, Chen Qiu, Majid Safari, Naofal Al-Dhahir, Shiyin Li

Then, we derive the expression of LiFi transmission rate based on the m-pulse-amplitude-modulation (M-PAM).

Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs

1 code implementation3 Nov 2023 Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao

In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.

ReEval: Automatic Hallucination Evaluation for Retrieval-Augmented Large Language Models via Transferable Adversarial Attacks

no code implementations19 Oct 2023 Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao

Specifically, given the potential of data contamination (e. g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided evidence for responding, which is essential to avoid hallucination when the required knowledge is new or private.

Hallucination Hallucination Evaluation +6

Fast-ELECTRA for Efficient Pre-training

no code implementations11 Oct 2023 chengyu dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu

Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model.

Language Modelling

Decentralized Multi-agent Reinforcement Learning based State-of-Charge Balancing Strategy for Distributed Energy Storage System

no code implementations29 Aug 2023 Zheng Xiong, Biao Luo, Bing-Chuan Wang, Xiaodong Xu, Xiaodong Liu, TingWen Huang

Specifically, the first-order average consensus algorithm is utilized to expand the observations of the DESS state in a fully-decentralized way, and the initial actions (i. e., output power) are decided by the agents (i. e., energy storage units) according to these observations.

counterfactual Multi-agent Reinforcement Learning

Testing for Peer Effects without Specifying the Network Structure

no code implementations16 Jun 2023 Hyunseok Jung, Xiaodong Liu

This paper proposes an Anderson-Rubin (AR) test for the presence of peer effects in panel data without the need to specify the network structure.

valid

Augmenting Language Models with Long-Term Memory

no code implementations NeurIPS 2023 Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness.

In-Context Learning Language Modelling +1

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

no code implementations23 May 2023 Yu Zhang, Hao Cheng, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, Jianfeng Gao

Scientific literature understanding tasks have gained significant attention due to their potential to accelerate scientific discovery.

Citation Prediction Contrastive Learning +1

Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

1 code implementation21 May 2023 Linyuan Gong, Chenyan Xiong, Xiaodong Liu, Payal Bajaj, Yiqing Xie, Alvin Cheung, Jianfeng Gao, Xia Song

This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5.

MMLU Zero-shot Generalization

Chain-of-Skills: A Configurable Model for Open-domain Question Answering

1 code implementation4 May 2023 Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.

Open-Domain Question Answering Retrieval +1

Pre-training Transformers for Knowledge Graph Completion

no code implementations28 Mar 2023 Sanxing Chen, Hao Cheng, Xiaodong Liu, Jian Jiao, Yangfeng Ji, Jianfeng Gao

Learning transferable representation of knowledge graphs (KGs) is challenging due to the heterogeneous, multi-relational nature of graph structures.

Deep Vision in Analysis and Recognition of Radar Data: Achievements, Advancements and Challenges

no code implementations20 Feb 2023 Qi Liu, ZhiYun Yang, Ru Ji, Yonghong Zhang, Muhammad Bilal, Xiaodong Liu, S Vimal, Xiaolong Xu

Radars are widely used to obtain echo information for effective prediction, such as precipitation nowcasting.

Joint Beamforming and PD Orientation Design for Mobile Visible Light Communications

no code implementations21 Dec 2022 Shuai Ma, Jing Wang, Chun Du, Hang Li, Xiaodong Liu, Youlong Wu, Naofal Al-Dhahir, Shiyin Li

To address this challenge, we propose an alternating optimization algorithm to obtain the transmit beamforming and the PD orientation.

Language Models as Inductive Reasoners

1 code implementation21 Dec 2022 Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei

To this end, we propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1. 2k rule-fact pairs for the task, where rules and facts are written in natural language.

Philosophy

Efficient Long Sequence Modeling via State Space Augmented Transformer

1 code implementation15 Dec 2022 Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao

Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.

Computational Efficiency Decoder +4

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

1 code implementation31 Oct 2022 Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.

parameter-efficient fine-tuning

Open-domain Question Answering via Chain of Reasoning over Heterogeneous Knowledge

2 code implementations22 Oct 2022 Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.

Open-Domain Question Answering

Task-Aware Specialization for Efficient and Robust Dense Retrieval for Open-Domain Question Answering

1 code implementation11 Oct 2022 Hao Cheng, Hao Fang, Xiaodong Liu, Jianfeng Gao

Given its effectiveness on knowledge-intensive natural language processing tasks, dense retrieval models have become increasingly popular.

Open-Domain Question Answering Retrieval

PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

1 code implementation6 Sep 2022 Han Wang, Jun Tang, Xiaodong Liu, Shanyan Guan, Rong Xie, Li Song

The temporal information is introduced by the temporal feature aggregation model (TFAM), by conducting an attention mechanism between the context frames and the target frame (i. e., the frame to be detected).

object-detection Video Object Detection

Deep Generative Modeling on Limited Data with Regularization by Nontransferable Pre-trained Models

1 code implementation30 Aug 2022 Yong Zhong, Hongtao Liu, Xiaodong Liu, Fan Bao, Weiran Shen, Chongxuan Li

Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits.

AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

1 code implementation24 May 2022 Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao

Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.

Natural Language Understanding parameter-efficient fine-tuning +1

Visually-Augmented Language Modeling

1 code implementation20 May 2022 Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei

With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.

Image Retrieval Language Modelling +1

METRO: Efficient Denoising Pretraining of Large Scale Autoencoding Language Models with Model Generated Signals

no code implementations13 Apr 2022 Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao

We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.

Denoising

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

6 code implementations7 Mar 2022 Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models

no code implementations17 Feb 2022 Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao

With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.

Language Modelling

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

no code implementations29 Jan 2022 Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao

Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.

Inductive Bias Knowledge Distillation +1

Knowledge-Rich Self-Supervision for Biomedical Entity Linking

no code implementations15 Dec 2021 Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

Zero-shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia.

Contrastive Learning Entity Linking

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

2 code implementations6 Dec 2021 Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang

In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.

 Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

1 code implementation NeurIPS 2021 Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes.

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

1 code implementation4 Nov 2021 Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks.

Few-Shot Learning Natural Language Understanding

Open Domain Question Answering with A Unified Knowledge Interface

1 code implementation ACL 2022 Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao

The retriever-reader framework is popular for open-domain question answering (ODQA) due to its ability to use explicit knowledge.

Data-to-Text Generation Natural Questions +2

Taming Sparsely Activated Transformer with Stochastic Experts

1 code implementation ICLR 2022 Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.

Machine Translation Translation

Learning to Persuade

no code implementations29 Sep 2021 Xiaodong Liu, Zhikang Fan, Xun Wang, Weiran Shen

Then we update the sender model to obtain an approximately optimal scheme using the receiver model.

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation ACL 2021 Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Targeted Adversarial Training for Natural Language Understanding

1 code implementation NAACL 2021 Lis Pereira, Xiaodong Liu, Hao Cheng, Hoifung Poon, Jianfeng Gao, Ichiro Kobayashi

We present a simple yet effective Targeted Adversarial Training (TAT) algorithm to improve adversarial training for natural language understanding.

Natural Language Understanding

Unveiling personnel movement in a larger indoor area with a non-overlapping multi-camera system

no code implementations10 Apr 2021 Ping Zhang, Zhenxiang Tao, Wenjie Yang, Minze Chen, Shan Ding, Xiaodong Liu, Rui Yang, HUI ZHANG

Surveillance cameras are widely applied for indoor occupancy measurement and human movement perception, which benefit for building energy management and social security.

energy management Management +1

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations Findings (EMNLP) 2021 Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation NMT +2

Tracking Air Pollution in China: Near Real-Time PM2.5 Retrievals from Multiple Data Sources

no code implementations11 Mar 2021 Guannan Geng, Qingyang Xiao, Shigan Liu, Xiaodong Liu, Jing Cheng, Yixuan Zheng, Dan Tong, Bo Zheng, Yiran Peng, Xiaomeng Huang, Kebin He, Qiang Zhang

Accordingly, a full-coverage high-resolution air pollutant dataset with timely updates and historical long-term records is essential to support both research and environmental management.

Management

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation1 Jan 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Natural Questions Open-Domain Question Answering +3

Posterior Differential Regularization with f-divergence for Improving Model Robustness

2 code implementations NAACL 2021 Hao Cheng, Xiaodong Liu, Lis Pereira, YaoLiang Yu, Jianfeng Gao

Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework.

Domain Generalization

A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

1 code implementation COLING 2020 Sanxing Chen, Aidan San, Xiaodong Liu, Yangfeng Ji

In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) for the generated SQL query is both crucial and challenging; the parser is required to connect the natural language (NL) question and the SQL query to the structured knowledge in the database.

Graph Neural Network SQL Parsing +1

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation ACL 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Natural Questions Open-Domain Question Answering +4

HittER: Hierarchical Transformers for Knowledge Graph Embeddings

3 code implementations EMNLP 2021 Sanxing Chen, Xiaodong Liu, Jianfeng Gao, Jian Jiao, Ruofei Zhang, Yangfeng Ji

Our proposed model consists of two different Transformer blocks: the bottom block extracts features of each entity-relation pair in the local neighborhood of the source entity and the top block aggregates the relational information from outputs of the bottom block.

 Ranked #1 on Link Prediction on FB15k-237 (Hit@10 metric)

Knowledge Graph Embeddings Link Prediction +2

Very Deep Transformers for Neural Machine Translation

4 code implementations18 Aug 2020 Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

We explore the application of very deep Transformer models for Neural Machine Translation (NMT).

 Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)

Decoder Machine Translation +2

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

1 code implementation31 Jul 2020 Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Continual Pretraining +11

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

15 code implementations ICLR 2021 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.

Common Sense Reasoning Coreference Resolution +11

Adversarial Training for Large Neural Language Models

3 code implementations20 Apr 2020 Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao

In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.

Ranked #6 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Natural Language Understanding

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

3 code implementations28 Feb 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Decoder +4

MLFcGAN: Multi-level Feature Fusion based Conditional GAN for Underwater Image Color Correction

no code implementations13 Feb 2020 Xiaodong Liu, Zhi Gao, Ben M. Chen

Color correction for underwater images has received increasing interests, due to its critical role in facilitating available mature vision algorithms for underwater scenarios.

Generative Adversarial Network

RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers

4 code implementations ACL 2020 Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, Matthew Richardson

The generalization challenge lies in (a) encoding the database relations in an accessible way for the semantic parser, and (b) modeling alignment between database columns and their mentions in a given query.

Relation Text-To-SQL

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

6 code implementations ACL 2020 Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Linguistic Acceptability Natural Language Inference +4

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations IJCNLP 2019 Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

On the Variance of the Adaptive Learning Rate and Beyond

21 code implementations ICLR 2020 Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

A Hybrid Neural Network Model for Commonsense Reasoning

3 code implementations WS 2019 Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.

Common Sense Reasoning Coreference Resolution +6

Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading

1 code implementation ACL 2019 Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, Jianfeng Gao

Although neural conversation models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous.

Diversity Informativeness +2

Unified Language Model Pre-training for Natural Language Understanding and Generation

9 code implementations NeurIPS 2019 Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +7

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

3 code implementations20 Apr 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.

Ensemble Learning Knowledge Distillation +5

A Hybrid Retrieval-Generation Neural Conversation Model

1 code implementation19 Apr 2019 Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen, Jingjing Liu

In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods.

Diversity Text Generation +1

Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

2 code implementations NAACL 2019 Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin

Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks.

Decoder Language Modelling +2

Multi-Task Deep Neural Networks for Natural Language Understanding

7 code implementations ACL 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.

Domain Adaptation Language Modelling +5

Stochastic Answer Networks for SQuAD 2.0

5 code implementations24 Sep 2018 Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao

This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not.

Machine Reading Comprehension Question Answering

Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension

5 code implementations NAACL 2019 Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao

We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.

Machine Reading Comprehension Machine Translation +3

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

no code implementations NeurIPS 2018 Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He

Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks.

Decoder Language Modelling +2

Stochastic Answer Networks for Natural Language Inference

3 code implementations21 Apr 2018 Xiaodong Liu, Kevin Duh, Jianfeng Gao

We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference.

Natural Language Inference

Dynamic Fusion Networks for Machine Reading Comprehension

no code implementations14 Nov 2017 Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu

This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC).

Machine Reading Comprehension Reinforcement Learning

An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

no code implementations IJCNLP 2017 Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao

Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.

Descriptive Reading Comprehension +2

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

13 code implementations28 Nov 2016 Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Benchmarking Machine Reading Comprehension +1

Cannot find the paper you are looking for? You can Submit a new open access paper.