1 code implementation • 5 Jun 2023 • Yujia Xie, Xun Wang, Si-Qing Chen, Wayne Xiong, Pengcheng He
Summarizing lengthy documents is a common and essential task in our daily lives.
no code implementations • 23 May 2023 • Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan
(2) We further apply a small language model as a trainable rewriter, which rewrite the search query to cater to the frozen retriever and the LLM reader.
no code implementations • 11 May 2023 • Xinbei Ma, Yeyun Gong, Pengcheng He, Hai Zhao, Nan Duan
Based on the remarkable achievements of pre-trained language models in abstractive summarization, the copying mechanism has proved helpful by improving the factuality, stability, and overall performance.
no code implementations • 9 May 2023 • Lesly Miculicich, Yujia Xie, Song Wang, Pengcheng He
Many applications of text generation such as summarization benefit from accurately controlling the text length.
1 code implementation • 4 May 2023 • Wen Xiao, Yujia Xie, Giuseppe Carenini, Pengcheng He
Tailoring outputs of large language models, such as ChatGPT, to specific user needs remains a challenge despite their impressive generation quality.
1 code implementation • 1 May 2023 • Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou
To achieve this, we propose a vision-language prompt that can model a wide range of vision-language tasks and a diffusion model that takes it as input.
1 code implementation • 29 Apr 2023 • Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou
Through prompting, large-scale pre-trained models have become more expressive and powerful, gaining significant attention in recent years.
1 code implementation • 25 Apr 2023 • Zhendong Wang, Yifan Jiang, Huangjie Zheng, Peihao Wang, Pengcheng He, Zhangyang Wang, Weizhu Chen, Mingyuan Zhou
Patch Diffusion meanwhile improves the performance of diffusion models trained on relatively small datasets, $e. g.$, as few as 5, 000 images to train from scratch.
1 code implementation • 6 Apr 2023 • Baolin Peng, Chunyuan Li, Pengcheng He, Michel Galley, Jianfeng Gao
Prior work has shown that finetuning large language models (LLMs) using machine-generated instruction-following data enables such models to achieve remarkable zero-shot capabilities on new tasks, and no human-written instructions are needed.
1 code implementation • 18 Mar 2023 • Qingru Zhang, Minshuo Chen, Alexander Bukharin, Pengcheng He, Yu Cheng, Weizhu Chen, Tuo Zhao
Therefore, many fine-tuning methods are proposed to learn incremental updates of pre-trained weights in a parameter efficient way, e. g., low-rank increments.
no code implementations • 24 Feb 2023 • Baolin Peng, Michel Galley, Pengcheng He, Hao Cheng, Yujia Xie, Yu Hu, Qiuyuan Huang, Lars Liden, Zhou Yu, Weizhu Chen, Jianfeng Gao
Large language models (LLMs), such as ChatGPT, are able to generate human-like, fluent responses for many downstream tasks, e. g., task-oriented dialog and question answering.
1 code implementation • 22 Feb 2023 • Zekun Li, Baolin Peng, Pengcheng He, Michel Galley, Jianfeng Gao, Xifeng Yan
We introduce a new framework, Directional Stimulus Prompting, that uses a tuneable language model (LM) to provide guidance for the black-box frozen large language model (LLM) on downstream tasks.
no code implementations • 8 Feb 2023 • Korawat Tanwisuth, Shujian Zhang, Pengcheng He, Mingyuan Zhou
Finally, it refines the target model on the target domain data without guidance from the source model.
1 code implementation • 21 Dec 2022 • Wen Xiao, Lesly Miculicich, Yang Liu, Pengcheng He, Giuseppe Carenini
Content-Controllable Summarization generates summaries focused on the given controlling signals.
no code implementations • 20 Dec 2022 • Yu Li, Baolin Peng, Pengcheng He, Michel Galley, Zhou Yu, Jianfeng Gao
In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain.
no code implementations • 8 Dec 2022 • Xingxing Zhang, Yiran Liu, Xun Wang, Pengcheng He, Yang Yu, Si-Qing Chen, Wayne Xiong, Furu Wei
The input and output of most text generation tasks can be transformed to two sequences of tokens and they can be modeled using sequence-to-sequence learning modeling tools such as Transformers.
Ranked #1 on
Text Summarization
on SAMSum Corpus
no code implementations • 22 Nov 2022 • Jason Phang, Yi Mao, Pengcheng He, Weizhu Chen
Fine-tuning large language models for different tasks can be costly and inefficient, and even methods that reduce the number of tuned parameters still require full gradient-based optimization.
no code implementations • 4 Oct 2022 • Chen Liang, Simiao Zuo, Qingru Zhang, Pengcheng He, Weizhu Chen, Tuo Zhao
As such, TED reduces the knowledge gap between the two models and helps the student to fit better on the target task.
no code implementations • 21 Aug 2022 • Pengcheng He, Baolin Peng, Liyang Lu, Song Wang, Jie Mei, Yang Liu, Ruochen Xu, Hany Hassan Awadalla, Yu Shi, Chenguang Zhu, Wayne Xiong, Michael Zeng, Jianfeng Gao, Xuedong Huang
Z-Code++ creates new state of the art on 9 out of 13 text summarization tasks across 5 languages.
1 code implementation • NAACL 2022 • Zhengbao Jiang, Yi Mao, Pengcheng He, Graham Neubig, Weizhu Chen
The information in tables can be an important complement to text, making table-based question answering (QA) systems of great value.
Ranked #4 on
Semantic Parsing
on WikiTableQuestions
1 code implementation • 25 Jun 2022 • Qingru Zhang, Simiao Zuo, Chen Liang, Alexander Bukharin, Pengcheng He, Weizhu Chen, Tuo Zhao
Large Transformer-based models have exhibited superior performance in various natural language processing and computer vision tasks.
1 code implementation • 22 Jun 2022 • Baolin Peng, Michel Galley, Pengcheng He, Chris Brockett, Lars Liden, Elnaz Nouri, Zhou Yu, Bill Dolan, Jianfeng Gao
We introduce GODEL (Grounded Open Dialogue Language Model), a large pre-trained language model for dialog.
3 code implementations • 5 Jun 2022 • Zhendong Wang, Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou
Both the observed and generated data are diffused by the same adaptive diffusion process.
Ranked #1 on
Image Generation
on AFHQ Wild
(FID metric)
no code implementations • Findings (NAACL) 2022 • Shujian Zhang, Chengyue Gong, Xingchao Liu, Pengcheng He, Weizhu Chen, Mingyuan Zhou
Active learning, which effectively collects informative unlabeled data for annotation, reduces the demand for labeled data.
1 code implementation • NAACL 2022 • Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen
We propose MoEBERT, which uses a Mixture-of-Experts structure to increase model capacity and inference speed.
1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
1 code implementation • 19 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou
Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain.
2 code implementations • 14 Feb 2022 • Huangjie Zheng, Pengcheng He, Weizhu Chen, Mingyuan Zhou
In this paper, to exploit both global and local dependencies without self-attention, we present Mix-Shift-MLP (MS-MLP) which makes the size of the local receptive field used for mixing increase with respect to the amount of spatial shifting.
1 code implementation • ICLR 2022 • Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Analysis shows that the proposed schedule indeed reduces the redundancy and improves generalization performance.
2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang
In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.
Ranked #2 on
Common Sense Reasoning
on CommonsenseQA
(using extra training data)
3 code implementations • 18 Nov 2021 • Pengcheng He, Jianfeng Gao, Weizhu Chen
We thus propose a new gradient-disentangled embedding sharing method that avoids the tug-of-war dynamics, improving both training efficiency and the quality of the pre-trained model.
Ranked #1 on
Question Answering
on SWAG
Natural Language Inference
Natural Language Understanding
+2
no code implementations • 29 Sep 2021 • Shujian Zhang, Zhibin Duan, Huangjie Zheng, Pengcheng He, Bo Chen, Weizhu Chen, Mingyuan Zhou
Crossformer with states sharing not only provides the desired cross-layer guidance and regularization but also reduces the memory requirement.
1 code implementation • Findings (EMNLP) 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization can improve model generalization in many natural language processing tasks.
1 code implementation • ACL 2021 • Chen Liang, Simiao Zuo, Minshuo Chen, Haoming Jiang, Xiaodong Liu, Pengcheng He, Tuo Zhao, Weizhu Chen
The Lottery Ticket Hypothesis suggests that an over-parametrized network consists of ``lottery tickets'', and training a certain collection of them (i. e., a subnetwork) can match the performance of the full model.
1 code implementation • EMNLP 2021 • Simiao Zuo, Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Jianfeng Gao, Weizhu Chen, Tuo Zhao
Adversarial regularization has been shown to improve the generalization performance of deep learning models in various natural language processing tasks.
no code implementations • Findings (EMNLP) 2021 • Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Tuo Zhao
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage.
no code implementations • 23 Feb 2021 • Yuhui Wang, Qingyuan Wu, Pengcheng He, Xiaoyang Tan
Most of the policy evaluation algorithms are based on the theories of Bellman Expectation and Optimality Equation, which derive two popular approaches - Policy Iteration (PI) and Value Iteration (VI).
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
no code implementations • ACL 2021 • Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.
Ranked #1 on
Open-Domain Question Answering
on TriviaQA
1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.
1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.
Ranked #9 on
Passage Retrieval
on Natural Questions
9 code implementations • ICLR 2021 • Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen
Recent progress in pre-trained neural language models has significantly improved the performance of many natural language processing (NLP) tasks.
Ranked #1 on
Natural Language Inference
on MRPC Dev
no code implementations • EMNLP 2020 • Tao Shen, Yi Mao, Pengcheng He, Guodong Long, Adam Trischler, Weizhu Chen
In contrast to existing paradigms, our approach uses knowledge graphs implicitly, only during pre-training, to inject language models with structured knowledge via learning from raw text.
3 code implementations • 20 Apr 2020 • Xiaodong Liu, Hao Cheng, Pengcheng He, Weizhu Chen, Yu Wang, Hoifung Poon, Jianfeng Gao
In natural language processing (NLP), pre-training large neural language models such as BERT have demonstrated impressive gain in generalization for a variety of tasks, with further improvement from adversarial fine-tuning.
Ranked #4 on
Natural Language Inference
on ANLI test
(using extra training data)
3 code implementations • ACL 2020 • Xiaodong Liu, Yu Wang, Jianshu ji, Hao Cheng, Xueyun Zhu, Emmanuel Awa, Pengcheng He, Weizhu Chen, Hoifung Poon, Guihong Cao, Jianfeng Gao
We present MT-DNN, an open-source natural language understanding (NLU) toolkit that makes it easy for researchers and developers to train customized deep learning models.
6 code implementations • ACL 2020 • Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Tuo Zhao
However, due to limited data resources from downstream tasks and the extremely large capacity of pre-trained models, aggressive fine-tuning often causes the adapted model to overfit the data of downstream tasks and forget the knowledge of the pre-trained model.
Ranked #1 on
Semantic Textual Similarity
on STS Benchmark
no code implementations • 21 Aug 2019 • Pengcheng He, Yi Mao, Kaushik Chakrabarti, Weizhu Chen
In this work, we present X-SQL, a new network architecture for the problem of parsing natural language to SQL query.
20 code implementations • ICLR 2020 • Liyuan Liu, Haoming Jiang, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao, Jiawei Han
The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.
3 code implementations • WS 2019 • Pengcheng He, Xiaodong Liu, Weizhu Chen, Jianfeng Gao
An HNN consists of two component models, a masked language model and a semantic similarity model, which share a BERT-based contextual encoder but use different model-specific input and output layers.
Ranked #1 on
Natural Language Understanding
on WNLI
3 code implementations • 20 Apr 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding tasks.
Ranked #1 on
Semantic Textual Similarity
on SentEval
8 code implementations • ACL 2019 • Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
In this paper, we present a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks.
Ranked #2 on
Natural Language Inference
on SciTail