Search Results for author: Xiaodong Liu

Found 68 papers, 41 papers with code

Pseudo-Masked Language Models for Unified Language Model Pre-Training

1 code implementation ICML 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Language Modelling Natural Language Understanding

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

1 code implementation7 Mar 2022 Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

A Survey of Knowledge-Intensive NLP with Pre-Trained Language Models

no code implementations17 Feb 2022 Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao

With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.

Language Modelling

AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models

no code implementations29 Jan 2022 Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao

Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.

Knowledge Distillation Neural Architecture Search

Knowledge-Rich Self-Supervised Entity Linking

no code implementations15 Dec 2021 Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

Zero-shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia.

Contrastive Learning Entity Linking

Human Parity on CommonsenseQA: Augmenting Self-Attention with External Attention

2 code implementations6 Dec 2021 Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang

In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.

 Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)

Common Sense Reasoning

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

1 code implementation NeurIPS 2021 Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes.

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

1 code implementation4 Nov 2021 Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks.

Few-Shot Learning Natural Language Understanding

LiST: Lite Self-training Makes Efficient Few-shot Learners

1 code implementation12 Oct 2021 Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao

This also significantly reduces the overall model footprint across several tasks that can now share a common PLM encoder as backbone for inference.

Few-Shot Learning

Taming Sparsely Activated Transformer with Stochastic Experts

1 code implementation ICLR 2022 Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao

While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.

Machine Translation Translation

Learning to Persuade

no code implementations29 Sep 2021 Xiaodong Liu, Zhikang Fan, Xun Wang, Weiran Shen

Then we update the sender model to obtain an approximately optimal scheme using the receiver model.

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization

1 code implementation ACL 2021 Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen

The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.

Model Compression Multi-Task Learning

Targeted Adversarial Training for Natural Language Understanding

1 code implementation NAACL 2021 Lis Pereira, Xiaodong Liu, Hao Cheng, Hoifung Poon, Jianfeng Gao, Ichiro Kobayashi

We present a simple yet effective Targeted Adversarial Training (TAT) algorithm to improve adversarial training for natural language understanding.

Natural Language Understanding

Unveiling personnel movement in a larger indoor area with a non-overlapping multi-camera system

no code implementations10 Apr 2021 Ping Zhang, Zhenxiang Tao, Wenjie Yang, Minze Chen, Shan Ding, Xiaodong Liu, Rui Yang, HUI ZHANG

Surveillance cameras are widely applied for indoor occupancy measurement and human movement perception, which benefit for building energy management and social security.

Person Re-Identification

Token-wise Curriculum Learning for Neural Machine Translation

no code implementations Findings (EMNLP) 2021 Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao

Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.

Machine Translation Translation

Tracking Air Pollution in China: Near Real-Time PM2.5 Retrievals from Multiple Data Sources

no code implementations11 Mar 2021 Guannan Geng, Qingyang Xiao, Shigan Liu, Xiaodong Liu, Jing Cheng, Yixuan Zheng, Dan Tong, Bo Zheng, Yiran Peng, Xiaomeng Huang, Kebin He, Qiang Zhang

Accordingly, a full-coverage high-resolution air pollutant dataset with timely updates and historical long-term records is essential to support both research and environmental management.

UnitedQA: A Hybrid Approach for Open Domain Question Answering

no code implementations ACL 2021 Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.

Open-Domain Question Answering

Rider: Reader-Guided Passage Reranking for Open-Domain Question Answering

1 code implementation1 Jan 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.

Open-Domain Question Answering

Posterior Differential Regularization with f-divergence for Improving Model Robustness

1 code implementation NAACL 2021 Hao Cheng, Xiaodong Liu, Lis Pereira, YaoLiang Yu, Jianfeng Gao

Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework.

Domain Generalization

A Tale of Two Linkings: Dynamically Gating between Schema Linking and Structural Linking for Text-to-SQL Parsing

1 code implementation COLING 2020 Sanxing Chen, Aidan San, Xiaodong Liu, Yangfeng Ji

In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) for the generated SQL query is both crucial and challenging; the parser is required to connect the natural language (NL) question and the SQL query to the structured knowledge in the database.

Semantic Parsing SQL Parsing +1

Generation-Augmented Retrieval for Open-domain Question Answering

1 code implementation ACL 2021 Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen

We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.

Open-Domain Question Answering Passage Retrieval +1

HittER: Hierarchical Transformers for Knowledge Graph Embeddings

1 code implementation EMNLP 2021 Sanxing Chen, Xiaodong Liu, Jianfeng Gao, Jian Jiao, Ruofei Zhang, Yangfeng Ji

Our proposed model consists of two different Transformer blocks: the bottom block extracts features of each entity-relation pair in the local neighborhood of the source entity and the top block aggregates the relational information from outputs of the bottom block.

Knowledge Graph Embeddings Link Prediction +1

Very Deep Transformers for Neural Machine Translation

4 code implementations18 Aug 2020 Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

We explore the application of very deep Transformer models for Neural Machine Translation (NMT).

 Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)

Machine Translation Translation

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing

no code implementations31 Jul 2020 Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon

In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.

Continual Pretraining Document Classification +8

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

5 code implementations ICLR 2021 Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen

Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.

Common Sense Reasoning Coreference Resolution +9

Adversarial Training for Large Neural Language Models

2 code implementations20 Apr 2020 Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao

In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.

Ranked #3 on Natural Language Inference on ANLI test (using extra training data)

Natural Language Inference Natural Language Understanding

UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training

2 code implementations28 Feb 2020 Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).

Ranked #3 on Question Generation on SQuAD1.1 (using extra training data)

Abstractive Text Summarization Language Modelling +2

MLFcGAN: Multi-level Feature Fusion based Conditional GAN for Underwater Image Color Correction

no code implementations13 Feb 2020 Xiaodong Liu, Zhi Gao, Ben M. Chen

Color correction for underwater images has received increasing interests, due to its critical role in facilitating available mature vision algorithms for underwater scenarios.

RAT-SQL: Relation-Aware Schema Encoding and Linking for Text-to-SQL Parsers

4 code implementations ACL 2020 Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, Matthew Richardson

The generalization challenge lies in (a) encoding the database relations in an accessible way for the semantic parser, and (b) modeling alignment between database columns and their mentions in a given query.

Semantic Parsing Text-To-Sql

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

4 code implementations ACL 2020 Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao

However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.

Linguistic Acceptability Natural Language Inference +3

Adversarial Domain Adaptation for Machine Reading Comprehension

no code implementations IJCNLP 2019 Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang

In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.

Machine Reading Comprehension Representation Learning +1

On the Variance of the Adaptive Learning Rate and Beyond

20 code implementations ICLR 2020 Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +3

A Hybrid Neural Network Model for Commonsense Reasoning

2 code implementations WS 2019 Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao

An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.

Common Sense Reasoning Language Modelling +2

Conversing by Reading: Contentful Neural Conversation with On-demand Machine Reading

1 code implementation ACL 2019 Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, Jianfeng Gao

Although neural conversation models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous.

Informativeness Reading Comprehension +1

Unified Language Model Pre-training for Natural Language Understanding and Generation

8 code implementations NeurIPS 2019 Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon

This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.

Ranked #2 on Generative Question Answering on CoQA (using extra training data)

Abstractive Text Summarization Document Summarization +6

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding

2 code implementations20 Apr 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.

Ensemble Learning Knowledge Distillation +5

A Hybrid Retrieval-Generation Neural Conversation Model

1 code implementation19 Apr 2019 Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen, Jingjing Liu

In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods.

Text Generation

Multi-Task Deep Neural Networks for Natural Language Understanding

8 code implementations ACL 2019 Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.

Domain Adaptation Language Modelling +5

Stochastic Answer Networks for SQuAD 2.0

5 code implementations24 Sep 2018 Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao

This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not.

Machine Reading Comprehension Question Answering

Multi-task Learning with Sample Re-weighting for Machine Reading Comprehension

5 code implementations NAACL 2019 Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao

We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.

Machine Reading Comprehension Machine Translation +3

Navigating with Graph Representations for Fast and Scalable Decoding of Neural Language Models

no code implementations NeurIPS 2018 Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He

Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks.

Language Modelling Machine Translation +1

Stochastic Answer Networks for Natural Language Inference

3 code implementations21 Apr 2018 Xiaodong Liu, Kevin Duh, Jianfeng Gao

We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference.

Natural Language Inference

Dynamic Fusion Networks for Machine Reading Comprehension

no code implementations14 Nov 2017 Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu

This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC).

Machine Reading Comprehension

An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

no code implementations IJCNLP 2017 Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao

Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.

Reading Comprehension reinforcement-learning

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

11 code implementations28 Nov 2016 Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang

The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.

Machine Reading Comprehension Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.