1 code implementation • ICML 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Jianfeng Gao, Songhao Piao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
1 code implementation • 25 Nov 2024 • Shaolei Zhang, Kehao Zhang, Qingkai Fang, Shoutao Guo, Yan Zhou, Xiaodong Liu, Yang Feng
In this paper, we introduce BayLing 2, which efficiently transfers generative capabilities and knowledge from high-resource languages to low-resource languages through language alignment.
no code implementations • 19 Nov 2024 • Yucheng Xing, Xiaodong Liu, Xin Wang
In this work, we propose a new adaptive framework, $\textit{Adaptively Controllable Diffusion (AC-Diff) Model}$, to automatically and fully control the generation process, including not only the type of generation result but also the length and parameters of the generation process.
no code implementations • 17 Nov 2024 • Xiaodong Liu, Yucheng Xing, Xin Wang
Reliable motion forecasting of surrounding agents is essential for ensuring the safe operation of autonomous vehicles.
no code implementations • 14 Nov 2024 • Dilxat Muhtar, Yelong Shen, Yaming Yang, Xiaodong Liu, Yadong Lu, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Xueliang Zhang, Jianfeng Gao, Weizhu Chen, Qi Zhang
The superior task adaptation and context encoding capabilities of StreamAdapter on both language understanding and generation tasks provides a new perspective for adapting LLMs at test time using context, allowing for more efficient adaptation across scenarios and more cost-effective inference
1 code implementation • 11 Nov 2024 • Xiaopeng Li, Shangwen Wang, Shasha Li, Jun Ma, Jie Yu, Xiaodong Liu, Jing Wang, Bin Ji, Weimin Zhang
Despite that, a comprehensive study that thoroughly compares and analyzes the performance of the state-of-the-art model editing techniques for adapting the knowledge within LLMs4Code across various code-related tasks is notably absent.
no code implementations • 8 Nov 2024 • Tong Chen, Hao Fang, Patrick Xia, Xiaodong Liu, Benjamin Van Durme, Luke Zettlemoyer, Jianfeng Gao, Hao Cheng
In StreamingQA, our approach is effective in injecting knowledge into the LM's parameters, achieving a 63. 5% improvement in F1 score over the model with supervised fine-tuning (from $19. 5$ to $31. 5$) for contexts as long as 32K tokens.
1 code implementation • 24 Oct 2024 • Chung-En Sun, Xiaodong Liu, Weiwei Yang, Tsui-Wei Weng, Hao Cheng, Aidan San, Michel Galley, Jianfeng Gao
Recent research has shown that Large Language Models (LLMs) are vulnerable to automated jailbreak attacks, where adversarial suffixes crafted by algorithms appended to harmful queries bypass safety alignment and trigger unintended responses.
no code implementations • 18 Sep 2024 • Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules.
no code implementations • 16 Sep 2024 • Qingru Zhang, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth, Hao Cheng
However, they often struggle to fully comprehend and effectively utilize their input contexts, resulting in responses that are unfaithful or hallucinated.
no code implementations • 26 Aug 2024 • Wei An, Xiao Bi, Guanting Chen, Shanhuang Chen, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Wenjun Gao, Kang Guan, JianZhong Guo, Yongqiang Guo, Zhe Fu, Ying He, Panpan Huang, Jiashi Li, Wenfeng Liang, Xiaodong Liu, Xin Liu, Yiyuan Liu, Yuxuan Liu, Shanghao Lu, Xuan Lu, Xiaotao Nie, Tian Pei, Junjie Qiu, Hui Qu, Zehui Ren, Zhangli Sha, Xuecheng Su, Xiaowen Sun, Yixuan Tan, Minghui Tang, Shiyu Wang, Yaohui Wang, Yongji Wang, Ziwei Xie, Yiliang Xiong, Yanhong Xu, Shengfeng Ye, Shuiping Yu, Yukun Zha, Liyue Zhang, Haowei Zhang, Mingchuan Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Yuheng Zou
For DL training, we deployed the Fire-Flyer 2 with 10, 000 PCIe A100 GPUs, achieved performance approximating the DGX-A100 while reducing costs by half and energy consumption by 40%.
no code implementations • 12 Jul 2024 • Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao
By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost.
1 code implementation • 25 May 2024 • Xiaodong Liu
We discover that this approach is not fully explored due to the methodological limitation of using word embeddings of language models to represent dictionary entries.
4 code implementations • 7 May 2024 • DeepSeek-AI, Aixin Liu, Bei Feng, Bin Wang, Bingxuan Wang, Bo Liu, Chenggang Zhao, Chengqi Dengr, Chong Ruan, Damai Dai, Daya Guo, Dejian Yang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Hanwei Xu, Hao Yang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J. L. Cai, Jian Liang, JianZhong Guo, Jiaqi Ni, Jiashi Li, Jin Chen, Jingyang Yuan, Junjie Qiu, Junxiao Song, Kai Dong, Kaige Gao, Kang Guan, Lean Wang, Lecong Zhang, Lei Xu, Leyi Xia, Liang Zhao, Liyue Zhang, Meng Li, Miaojun Wang, Mingchuan Zhang, Minghua Zhang, Minghui Tang, Mingming Li, Ning Tian, Panpan Huang, Peiyi Wang, Peng Zhang, Qihao Zhu, Qinyu Chen, Qiushi Du, R. J. Chen, R. L. Jin, Ruiqi Ge, Ruizhe Pan, Runxin Xu, Ruyi Chen, S. S. Li, Shanghao Lu, Shangyan Zhou, Shanhuang Chen, Shaoqing Wu, Shengfeng Ye, Shirong Ma, Shiyu Wang, Shuang Zhou, Shuiping Yu, Shunfeng Zhou, Size Zheng, T. Wang, Tian Pei, Tian Yuan, Tianyu Sun, W. L. Xiao, Wangding Zeng, Wei An, Wen Liu, Wenfeng Liang, Wenjun Gao, Wentao Zhang, X. Q. Li, Xiangyue Jin, Xianzu Wang, Xiao Bi, Xiaodong Liu, Xiaohan Wang, Xiaojin Shen, Xiaokang Chen, Xiaosha Chen, Xiaotao Nie, Xiaowen Sun, Xiaoxiang Wang, Xin Liu, Xin Xie, Xingkai Yu, Xinnan Song, Xinyi Zhou, Xinyu Yang, Xuan Lu, Xuecheng Su, Y. Wu, Y. K. Li, Y. X. Wei, Y. X. Zhu, Yanhong Xu, Yanping Huang, Yao Li, Yao Zhao, Yaofeng Sun, Yaohui Li, Yaohui Wang, Yi Zheng, Yichao Zhang, Yiliang Xiong, Yilong Zhao, Ying He, Ying Tang, Yishi Piao, Yixin Dong, Yixuan Tan, Yiyuan Liu, Yongji Wang, Yongqiang Guo, Yuchen Zhu, Yuduan Wang, Yuheng Zou, Yukun Zha, Yunxian Ma, Yuting Yan, Yuxiang You, Yuxuan Liu, Z. Z. Ren, Zehui Ren, Zhangli Sha, Zhe Fu, Zhen Huang, Zhen Zhang, Zhenda Xie, Zhewen Hao, Zhihong Shao, Zhiniu Wen, Zhipeng Xu, Zhongyu Zhang, Zhuoshu Li, Zihan Wang, Zihui Gu, Zilin Li, Ziwei Xie
MLA guarantees efficient inference through significantly compressing the Key-Value (KV) cache into a latent vector, while DeepSeekMoE enables training strong models at an economical cost through sparse computation.
no code implementations • 22 Apr 2024 • Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
Ranked #5 on MMR total on MRR-Benchmark (using extra training data)
no code implementations • 25 Mar 2024 • Dayu Shi, Xun Zhang, Ziqi Liu, Xuanbang Chen, Jianghao Li, Xiaodong Liu, William Shieh
This paper introduces a novel approach to optimize energy efficiency in wavelength division multiplexing (WDM) Visible Light Communication (VLC) systems designed for indoor broadcasting networks.
1 code implementation • 31 Jan 2024 • Xiaopeng Li, Shasha Li, Shezheng Song, Huijun Liu, Bin Ji, Xi Wang, Jun Ma, Jie Yu, Xiaodong Liu, Jing Wang, Weimin Zhang
In particular, local editing methods, which directly update model parameters, are more suitable for updating a small amount of knowledge.
1 code implementation • 25 Jan 2024 • Yanda Chen, Chandan Singh, Xiaodong Liu, Simiao Zuo, Bin Yu, He He, Jianfeng Gao
We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples.
1 code implementation • 5 Jan 2024 • DeepSeek-AI, :, Xiao Bi, Deli Chen, Guanting Chen, Shanhuang Chen, Damai Dai, Chengqi Deng, Honghui Ding, Kai Dong, Qiushi Du, Zhe Fu, Huazuo Gao, Kaige Gao, Wenjun Gao, Ruiqi Ge, Kang Guan, Daya Guo, JianZhong Guo, Guangbo Hao, Zhewen Hao, Ying He, Wenjie Hu, Panpan Huang, Erhang Li, Guowei Li, Jiashi Li, Yao Li, Y. K. Li, Wenfeng Liang, Fangyun Lin, A. X. Liu, Bo Liu, Wen Liu, Xiaodong Liu, Xin Liu, Yiyuan Liu, Haoyu Lu, Shanghao Lu, Fuli Luo, Shirong Ma, Xiaotao Nie, Tian Pei, Yishi Piao, Junjie Qiu, Hui Qu, Tongzheng Ren, Zehui Ren, Chong Ruan, Zhangli Sha, Zhihong Shao, Junxiao Song, Xuecheng Su, Jingxiang Sun, Yaofeng Sun, Minghui Tang, Bingxuan Wang, Peiyi Wang, Shiyu Wang, Yaohui Wang, Yongji Wang, Tong Wu, Y. Wu, Xin Xie, Zhenda Xie, Ziwei Xie, Yiliang Xiong, Hanwei Xu, R. X. Xu, Yanhong Xu, Dejian Yang, Yuxiang You, Shuiping Yu, Xingkai Yu, B. Zhang, Haowei Zhang, Lecong Zhang, Liyue Zhang, Mingchuan Zhang, Minghua Zhang, Wentao Zhang, Yichao Zhang, Chenggang Zhao, Yao Zhao, Shangyan Zhou, Shunfeng Zhou, Qihao Zhu, Yuheng Zou
The rapid development of open-source large language models (LLMs) has been truly remarkable.
no code implementations • 20 Dec 2023 • Shuai Ma, Haihong Sheng, Junchang Sun, Hang Li, Xiaodong Liu, Chen Qiu, Majid Safari, Naofal Al-Dhahir, Shiyin Li
Then, we derive the expression of LiFi transmission rate based on the m-pulse-amplitude-modulation (M-PAM).
1 code implementation • 3 Nov 2023 • Qingru Zhang, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao
In human-written articles, we often leverage the subtleties of text style, such as bold and italics, to guide the attention of readers.
no code implementations • 19 Oct 2023 • Xiaodong Yu, Hao Cheng, Xiaodong Liu, Dan Roth, Jianfeng Gao
Specifically, given the potential of data contamination (e. g., leading to memorization), good static benchmark performance does not ensure that model can reliably use the provided evidence for responding, which is essential to avoid hallucination when the required knowledge is new or private.
no code implementations • 11 Oct 2023 • chengyu dong, Liyuan Liu, Hao Cheng, Jingbo Shang, Jianfeng Gao, Xiaodong Liu
Although ELECTRA offers a significant boost in efficiency, its potential is constrained by the training cost brought by the auxiliary model.
no code implementations • 29 Aug 2023 • Zheng Xiong, Biao Luo, Bing-Chuan Wang, Xiaodong Xu, Xiaodong Liu, TingWen Huang
Specifically, the first-order average consensus algorithm is utilized to expand the observations of the DESS state in a fully-decentralized way, and the initial actions (i. e., output power) are decided by the agents (i. e., energy storage units) according to these observations.
no code implementations • 16 Jun 2023 • Hyunseok Jung, Xiaodong Liu
This paper proposes an Anderson-Rubin (AR) test for the presence of peer effects in panel data without the need to specify the network structure.
no code implementations • NeurIPS 2023 • Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
Such a decoupled memory design can easily cache and update long-term past contexts for memory retrieval without suffering from memory staleness.
no code implementations • 23 May 2023 • Yu Zhang, Hao Cheng, Zhihong Shen, Xiaodong Liu, Ye-Yi Wang, Jianfeng Gao
Scientific literature understanding tasks have gained significant attention due to their potential to accelerate scientific discovery.
1 code implementation • 21 May 2023 • Linyuan Gong, Chenyan Xiong, Xiaodong Liu, Payal Bajaj, Yiqing Xie, Alvin Cheung, Jianfeng Gao, Xia Song
This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5.
1 code implementation • 4 May 2023 • Kaixin Ma, Hao Cheng, Yu Zhang, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
Our approach outperforms recent self-supervised retrievers in zero-shot evaluations and achieves state-of-the-art fine-tuned retrieval performance on NQ, HotpotQA and OTT-QA.
Ranked #4 on Question Answering on HotpotQA
no code implementations • 28 Mar 2023 • Sanxing Chen, Hao Cheng, Xiaodong Liu, Jian Jiao, Yangfeng Ji, Jianfeng Gao
Learning transferable representation of knowledge graphs (KGs) is challenging due to the heterogeneous, multi-relational nature of graph structures.
no code implementations • 20 Feb 2023 • Qi Liu, ZhiYun Yang, Ru Ji, Yonghong Zhang, Muhammad Bilal, Xiaodong Liu, S Vimal, Xiaolong Xu
Radars are widely used to obtain echo information for effective prediction, such as precipitation nowcasting.
no code implementations • 21 Dec 2022 • Shuai Ma, Jing Wang, Chun Du, Hang Li, Xiaodong Liu, Youlong Wu, Naofal Al-Dhahir, Shiyin Li
To address this challenge, we propose an alternating optimization algorithm to obtain the transmit beamforming and the PD orientation.
1 code implementation • 21 Dec 2022 • Zonglin Yang, Li Dong, Xinya Du, Hao Cheng, Erik Cambria, Xiaodong Liu, Jianfeng Gao, Furu Wei
To this end, we propose a new paradigm (task) for inductive reasoning, which is to induce natural language rules from natural language facts, and create a dataset termed DEER containing 1. 2k rule-fact pairs for the task, where rules and facts are written in natural language.
1 code implementation • 15 Dec 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, Jianfeng Gao
Specifically, we augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers.
1 code implementation • 31 Oct 2022 • Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.
2 code implementations • 22 Oct 2022 • Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
We propose a novel open-domain question answering (ODQA) framework for answering single/multi-hop questions across heterogeneous knowledge sources.
1 code implementation • 14 Oct 2022 • Ganesh Jawahar, Subhabrata Mukherjee, Xiaodong Liu, Young Jin Kim, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Ahmed Hassan Awadallah, Sebastien Bubeck, Jianfeng Gao
Furthermore, existing MoE works do not consider computational constraints (e. g., FLOPs, latency) to guide their design.
1 code implementation • 11 Oct 2022 • Hao Cheng, Hao Fang, Xiaodong Liu, Jianfeng Gao
Given its effectiveness on knowledge-intensive natural language processing tasks, dense retrieval models have become increasingly popular.
1 code implementation • 6 Sep 2022 • Han Wang, Jun Tang, Xiaodong Liu, Shanyan Guan, Rong Xie, Li Song
The temporal information is introduced by the temporal feature aggregation model (TFAM), by conducting an attention mechanism between the context frames and the target frame (i. e., the frame to be detected).
Ranked #6 on Video Object Detection on ImageNet VID
1 code implementation • 30 Aug 2022 • Yong Zhong, Hongtao Liu, Xiaodong Liu, Fan Bao, Weiran Shen, Chongxuan Li
Deep generative models (DGMs) are data-eager because learning a complex model on limited data suffers from a large variance and easily overfits.
1 code implementation • 24 May 2022 • Yaqing Wang, Sahaj Agarwal, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
Standard fine-tuning of large pre-trained language models (PLMs) for downstream tasks requires updating hundreds of millions to billions of parameters, and storing a large copy of the PLM weights for every task resulting in increased cost for storing, sharing and serving the models.
Natural Language Understanding parameter-efficient fine-tuning +1
1 code implementation • 20 May 2022 • Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song, Xiaodong Liu, Xifeng Yan, Jianfeng Gao, Furu Wei
With the visually-augmented context, VaLM uses a visual knowledge fusion layer to enable multimodal grounded language modeling by attending to both text context and visual knowledge in images.
no code implementations • 13 Apr 2022 • Payal Bajaj, Chenyan Xiong, Guolin Ke, Xiaodong Liu, Di He, Saurabh Tiwary, Tie-Yan Liu, Paul Bennett, Xia Song, Jianfeng Gao
We present an efficient method of pretraining large-scale autoencoding language models using training signals generated by an auxiliary model.
6 code implementations • 7 Mar 2022 • Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao
Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.
no code implementations • 17 Feb 2022 • Da Yin, Li Dong, Hao Cheng, Xiaodong Liu, Kai-Wei Chang, Furu Wei, Jianfeng Gao
With the increasing of model capacity brought by pre-trained language models, there emerges boosting needs for more knowledgeable natural language processing (NLP) models with advanced functionalities including providing and making flexible use of encyclopedic and commonsense knowledge.
1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.
no code implementations • 29 Jan 2022 • Dongkuan Xu, Subhabrata Mukherjee, Xiaodong Liu, Debadeepta Dey, Wenhui Wang, Xiang Zhang, Ahmed Hassan Awadallah, Jianfeng Gao
Our framework AutoDistil addresses above challenges with the following steps: (a) Incorporates inductive bias and heuristics to partition Transformer search space into K compact sub-spaces (K=3 for typical student sizes of base, small and tiny); (b) Trains one SuperLM for each sub-space using task-agnostic objective (e. g., self-attention distillation) with weight-sharing of students; (c) Lightweight search for the optimal student without re-training.
no code implementations • 15 Dec 2021 • Sheng Zhang, Hao Cheng, Shikhar Vashishth, Cliff Wong, Jinfeng Xiao, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon
Zero-shot entity linking has emerged as a promising direction for generalizing to new entities, but it still requires example gold entity mentions during training and canonical descriptions for all entities, both of which are rarely available outside of Wikipedia.
no code implementations • 15 Dec 2021 • Robert Tinn, Hao Cheng, Yu Gu, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon
Overall, domainspecific vocabulary and pretraining facilitate more robust models for fine-tuning.
2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang
In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.
Ranked #1 on Common Sense Reasoning on CommonsenseQA (using extra training data)
1 code implementation • NeurIPS 2021 • Ge Yang, Edward Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao
Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization ($\mu$P), many optimal HPs remain stable even as model size changes.
1 code implementation • 4 Nov 2021 • Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao
We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks.
1 code implementation • ACL 2022 • Kaixin Ma, Hao Cheng, Xiaodong Liu, Eric Nyberg, Jianfeng Gao
The retriever-reader framework is popular for open-domain question answering (ODQA) due to its ability to use explicit knowledge.
1 code implementation • Findings (NAACL) 2022 • Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
The first is the use of self-training to leverage large amounts of unlabeled data for prompt-based FN in few-shot settings.
1 code implementation • ICLR 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao
While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.
no code implementations • 29 Sep 2021 • Xiaodong Liu, Zhikang Fan, Xun Wang, Weiran Shen
Then we update the sender model to obtain an approximately optimal scheme using the receiver model.
1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization can improve model generalization in many natural language processing tasks.
1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.
1 code implementation • NAACL 2021 • Lis Pereira, Xiaodong Liu, Hao Cheng, Hoifung Poon, Jianfeng Gao, Ichiro Kobayashi
We present a simple yet effective Targeted Adversarial Training (TAT) algorithm to improve adversarial training for natural language understanding.
1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.
no code implementations • 10 Apr 2021 • Ping Zhang, Zhenxiang Tao, Wenjie Yang, Minze Chen, Shan Ding, Xiaodong Liu, Rui Yang, HUI ZHANG
Surveillance cameras are widely applied for indoor occupancy measurement and human movement perception, which benefit for building energy management and social security.
no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.
no code implementations • 11 Mar 2021 • Guannan Geng, Qingyang Xiao, Shigan Liu, Xiaodong Liu, Jing Cheng, Yixuan Zheng, Dan Tong, Bo Zheng, Yiran Peng, Xiaomeng Huang, Kebin He, Qiang Zhang
Accordingly, a full-coverage high-resolution air pollutant dataset with timely updates and historical long-term records is essential to support both research and environmental management.
1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
no code implementations • ACL 2021 • Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.
Ranked #1 on Open-Domain Question Answering on TriviaQA
2 code implementations • NAACL 2021 • Hao Cheng, Xiaodong Liu, Lis Pereira, YaoLiang Yu, Jianfeng Gao
Theoretically, we provide a connection of two recent methods, Jacobian Regularization and Virtual Adversarial Training, under this framework.
1 code implementation • COLING 2020 • Sanxing Chen, Aidan San, Xiaodong Liu, Yangfeng Ji
In Text-to-SQL semantic parsing, selecting the correct entities (tables and columns) for the generated SQL query is both crucial and challenging; the parser is required to connect the natural language (NL) question and the SQL query to the structured knowledge in the database.
1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.
Ranked #9 on Passage Retrieval on Natural Questions
3 code implementations • EMNLP 2021 • Sanxing Chen, Xiaodong Liu, Jianfeng Gao, Jian Jiao, Ruofei Zhang, Yangfeng Ji
Our proposed model consists of two different Transformer blocks: the bottom block extracts features of each entity-relation pair in the local neighborhood of the source entity and the top block aggregates the relational information from outputs of the bottom block.
Ranked #1 on Link Prediction on FB15k-237 (Hit@10 metric)
4 code implementations • 18 Aug 2020 • Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao
We explore the application of very deep Transformer models for Neural Machine Translation (NMT).
Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)
1 code implementation • 31 Jul 2020 • Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon
In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual pretraining of general-domain language models.
Ranked #2 on Participant Intervention Comparison Outcome Extraction on EBM-NLP (using extra training data)
15 code implementations • ICLR 2021 • Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.
Ranked #1 on Common Sense Reasoning on SWAG
1 code implementation • WS 2020 • Lis Pereira, Xiaodong Liu, Fei Cheng, Masayuki Asahara, Ichiro Kobayashi
We propose an AdversariaL training algorithm for commonsense InferenCE (ALICE).
3 code implementations • 20 Apr 2020 • Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao
In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.
Ranked #6 on Natural Language Inference on ANLI test (using extra training data)
2 code implementations • EMNLP 2020 • Liyuan Liu, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Jiawei Han
Transformers have proved effective in many NLP tasks.
Ranked #5 on Machine Translation on WMT2014 English-French
3 code implementations • 28 Feb 2020 • Hangbo Bao, Li Dong, Furu Wei, Wenhui Wang, Nan Yang, Xiaodong Liu, Yu Wang, Songhao Piao, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
We propose to pre-train a unified language model for both autoencoding and partially autoregressive language modeling tasks using a novel training procedure, referred to as a pseudo-masked language model (PMLM).
Ranked #4 on Question Generation on SQuAD1.1 (using extra training data)
3 code implementations • ACL 2020 • Xiaodong Liu, Yu Wang, Jianshu ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models.
no code implementations • 13 Feb 2020 • Xiaodong Liu, Zhi Gao, Ben M. Chen
Color correction for underwater images has received increasing interests, due to its critical role in facilitating available mature vision algorithms for underwater scenarios.
no code implementations • 8 Feb 2020 • Hongwu Kuang, Xiaodong Liu, Jingwei Zhang, Zicheng Fang
Multi-modality fusion is the guarantee of the stability of autonomous driving systems.
4 code implementations • ACL 2020 • Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Polozov, Matthew Richardson
The generalization challenge lies in (a) encoding the database relations in an accessible way for the semantic parser, and (b) modeling alignment between database columns and their mentions in a given query.
Ranked #9 on Semantic Parsing on spider
6 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao
However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.
Ranked #1 on Natural Language Inference on QNLI
no code implementations • IJCNLP 2019 • Huazheng Wang, Zhe Gan, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Hongning Wang
In this paper, we focus on unsupervised domain adaptation for Machine Reading Comprehension (MRC), where the source domain has a large amount of labeled data, while only unlabeled passages are available in the target domain.
no code implementations • 22 Aug 2019 • Guoliang Feng, Wei Lu, Witold Pedrycz, Jianhua Yang, Xiaodong Liu
Index Terms-Fuzzy cognitive maps (FCMs), maximum entropy, noisy data, rapid and robust learning.
21 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.
3 code implementations • WS 2019 • Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao
An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.
Ranked #1 on Natural Language Understanding on PDP60
no code implementations • WS 2019 • Yichong Xu, Xiaodong Liu, Chunyuan Li, Hoifung Poon, Jianfeng Gao
We use a multi-source transfer learning approach to transfer the knowledge from MT-DNN and SciBERT to natural language understanding tasks in the medical domain.
1 code implementation • ACL 2019 • Lianhui Qin, Michel Galley, Chris Brockett, Xiaodong Liu, Xiang Gao, Bill Dolan, Yejin Choi, Jianfeng Gao
Although neural conversation models are effective in learning how to produce fluent responses, their primary challenge lies in knowing what to say to make the conversation contentful and non-vacuous.
9 code implementations • NeurIPS 2019 • Li Dong, Nan Yang, Wenhui Wang, Furu Wei, Xiaodong Liu, Yu Wang, Jianfeng Gao, Ming Zhou, Hsiao-Wuen Hon
This paper presents a new Unified pre-trained Language Model (UniLM) that can be fine-tuned for both natural language understanding and generation tasks.
Ranked #2 on Generative Question Answering on CoQA (using extra training data)
3 code implementations • 20 Apr 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.
Ranked #1 on Semantic Textual Similarity on SentEval
1 code implementation • 19 Apr 2019 • Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen, Jingjing Liu
In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods.
no code implementations • NAACL 2019 • Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang
Commonsense reasoning is fundamental to natural language understanding.
Ranked #2 on Natural Language Understanding on PDP60
2 code implementations • NAACL 2019 • Hao Fu, Chunyuan Li, Xiaodong Liu, Jianfeng Gao, Asli Celikyilmaz, Lawrence Carin
Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks.
7 code implementations • ACL 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.
Ranked #2 on Natural Language Inference on SciTail
no code implementations • 30 Oct 2018 • Sheng Zhang, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Kevin Duh, Benjamin Van Durme
We present a large-scale dataset, ReCoRD, for machine reading comprehension requiring commonsense reasoning.
Ranked #34 on Common Sense Reasoning on ReCoRD
5 code implementations • 24 Sep 2018 • Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao
This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not.
5 code implementations • NAACL 2019 • Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao
We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.
no code implementations • NeurIPS 2018 • Minjia Zhang, Xiaodong Liu, Wenhan Wang, Jianfeng Gao, Yuxiong He
Neural language models (NLMs) have recently gained a renewed interest by achieving state-of-the-art performance across many natural language processing (NLP) tasks.
3 code implementations • 21 Apr 2018 • Xiaodong Liu, Kevin Duh, Jianfeng Gao
We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference.
Ranked #32 on Natural Language Inference on SNLI
5 code implementations • ACL 2018 • Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao
We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension.
Ranked #24 on Question Answering on SQuAD1.1 dev
1 code implementation • CVPR 2018 • Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, Xiaodong Liu
First, we introduce a synthetic dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE system.
no code implementations • 14 Nov 2017 • Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu
This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC).
no code implementations • IJCNLP 2017 • Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao
Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.
no code implementations • IJCNLP 2017 • Lis Pereira, Xiaodong Liu, John Lee
We explore the application of a Deep Structured Similarity Model (DSSM) to ranking in lexical simplification.
13 code implementations • 28 Nov 2016 • Payal Bajaj, Daniel Campos, Nick Craswell, Li Deng, Jianfeng Gao, Xiaodong Liu, Rangan Majumder, Andrew McNamara, Bhaskar Mitra, Tri Nguyen, Mir Rosenberg, Xia Song, Alina Stoica, Saurabh Tiwary, Tong Wang
The size of the dataset and the fact that the questions are derived from real user search queries distinguishes MS MARCO from other well-known publicly available datasets for machine reading comprehension and question-answering.