no code implementations • Findings (ACL) 2022 • Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan
Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.
no code implementations • 4 Dec 2023 • Yiming Huang, Zhenghao Lin, Xiao Liu, Yeyun Gong, Shuai Lu, Fangyu Lei, Yaobo Liang, Yelong Shen, Chen Lin, Nan Duan, Weizhu Chen
Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently.
no code implementations • 10 Nov 2023 • Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen
Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions.
no code implementations • 1 Oct 2023 • Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, Yelong Shen
To help these agents adapt to new tasks without extensive human supervision, we propose the Learning through Communication (LTC) paradigm, a novel training approach enabling LLM agents to improve continuously through interactions with their environments and other agents.
1 code implementation • 29 Sep 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen
Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.
Ranked #6 on
Math Word Problem Solving
on MATH
(using extra training data)
1 code implementation • 18 Sep 2023 • Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen
We find that scaling LMM consistently enhances model performance and improves language capabilities, and performance of LoRA/QLoRA tuning of LMM are comparable to the performance of full-model fine-tuning.
no code implementations • 1 Sep 2023 • Michael Santacroce, Yadong Lu, Han Yu, Yuanzhi Li, Yelong Shen
To address this issue, we present a comprehensive analysis the memory usage, performance, and training time of memory-savings techniques for PPO.
no code implementations • 24 May 2023 • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen
In this paper, we show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
no code implementations • 24 May 2023 • Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren
Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.
1 code implementation • 19 May 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging.
1 code implementation • NeurIPS 2023 • Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance.
1 code implementation • 23 Apr 2023 • Jiashuo Sun, Yi Luo, Yeyun Gong, Chen Lin, Yelong Shen, Jian Guo, Nan Duan
By utilizing iterative bootstrapping, our approach enables LLMs to autonomously rectify errors, resulting in more precise and comprehensive reasoning chains.
1 code implementation • 7 Feb 2023 • Michael Santacroce, Zixin Wen, Yelong Shen, Yuanzhi Li
Auto-regressive large language models such as GPT-3 require enormous computational resources to use.
no code implementations • 1 Feb 2023 • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen
However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly.
1 code implementation • 22 Dec 2022 • Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, Weizhu Chen
In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE.
no code implementations • 20 Dec 2022 • Dong Li, Yelong Shen, Ruoming Jin, Yi Mao, Kuan Wang, Weizhu Chen
Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet.
2 code implementations • 18 Nov 2022 • Biyang Guo, Yeyun Gong, Yelong Shen, Songqiao Han, Hailiang Huang, Nan Duan, Weizhu Chen
We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens).
1 code implementation • 21 Oct 2022 • Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen
Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.
1 code implementation • 18 Oct 2022 • Xiaonan Li, Daya Guo, Yeyun Gong, Yun Lin, Yelong Shen, Xipeng Qiu, Daxin Jiang, Weizhu Chen, Nan Duan
In this paper, we present \textbf{SCodeR}, a \textbf{S}oft-labeled contrastive pre-training framework with two positive sample construction methods to learn functional-level \textbf{Code} \textbf{R}epresentation.
no code implementations • 13 Oct 2022 • Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan
Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations.
2 code implementations • 28 Jun 2022 • Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen
Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates.
no code implementations • 23 May 2022 • Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan
To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.
1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen
We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.
1 code implementation • 26 Jan 2022 • Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build code-text pairs.
no code implementations • NAACL 2022 • Yu Li, Baolin Peng, Yelong Shen, Yi Mao, Lars Liden, Zhou Yu, Jianfeng Gao
To address these challenges, we present PLUG, a language model that homogenizes different knowledge sources to a unified knowledge representation for knowledge-grounded dialogue generation tasks.
1 code implementation • ACL 2022 • Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren
Large pre-trained vision-language (VL) models can learn a new task with a handful of examples and generalize to a new task without fine-tuning.
Ranked #3 on
Image Captioning
on Flickr30k Captions test
(SPICE metric)
1 code implementation • ICLR 2022 • Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen
To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker.
33 code implementations • ICLR 2022 • Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen
To this end, we propose a multi-split reversible network and combine it with DARTS.
no code implementations • 10 May 2021 • Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen
We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
no code implementations • ACL 2021 • Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.
Ranked #1 on
Question Answering
on EfficientQA test
1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.
no code implementations • NeurIPS 2020 • Zijie Zhang, Zeru Zhang, Yang Zhou, Yelong Shen, Ruoming Jin, Dejing Dou
Despite achieving remarkable performance, deep graph learning models, such as node classification and network embedding, suffer from harassment caused by small adversarial perturbations.
no code implementations • ICLR 2021 • Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen
To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks.
no code implementations • 12 Oct 2020 • Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao
We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.
Ranked #1 on
Sentence Classification
on ACL-ARC
2 code implementations • 29 Sep 2020 • Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen
Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability.
Ranked #8 on
Machine Translation
on IWSLT2014 German-English
1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.
Ranked #9 on
Passage Retrieval
on Natural Questions
1 code implementation • ACL 2020 • Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu
In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.
1 code implementation • ACL 2020 • Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal
Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.
Ranked #5 on
Video Captioning
on ActivityNet Captions
1 code implementation • 19 Apr 2019 • Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen, Jingjing Liu
In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods.
no code implementations • NAACL 2019 • Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang
Commonsense reasoning is fundamental to natural language understanding.
Ranked #3 on
Natural Language Understanding
on PDP60
1 code implementation • CVPR 2019 • Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao
We therefore propose a new story-to-image-sequence generation model, StoryGAN, based on the sequential conditional GAN framework.
5 code implementations • NAACL 2019 • Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao
We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.
no code implementations • NeurIPS 2018 • Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao
In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards.
Ranked #43 on
Link Prediction
on WN18RR
(Hits@3 metric)
5 code implementations • ACL 2018 • Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao
We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension.
Ranked #24 on
Question Answering
on SQuAD1.1 dev
1 code implementation • CVPR 2018 • Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, Xiaodong Liu
First, we introduce a synthetic dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE system.
3 code implementations • ICLR 2018 • Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, Weizhu Chen
This paper introduces a new neural structure called FusionNet, which extends existing attention approaches from three perspectives.
Ranked #26 on
Question Answering
on SQuAD1.1 dev
no code implementations • 14 Nov 2017 • Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu
This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC).
no code implementations • IJCNLP 2017 • Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao
Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.
no code implementations • WS 2017 • Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
However, due to the size of knowledge bases, learning multi-step relations directly on top of observed triplets could be costly.
no code implementations • 14 Nov 2016 • Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
Since large knowledge bases are typically incomplete, missing facts need to be inferred from observed facts in a task called knowledge base completion.
no code implementations • 17 Sep 2016 • Yelong Shen, Po-Sen Huang, Jianfeng Gao, Weizhu Chen
Teaching a computer to read and answer general questions pertaining to a document is a challenging yet unsolved problem.
Ranked #7 on
Question Answering
on CNN / Daily Mail
1 code implementation • NeurIPS 2015 • Jianshu Chen, Ji He, Yelong Shen, Lin Xiao, Xiaodong He, Jianfeng Gao, Xinying Song, Li Deng
We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i. e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document.
no code implementations • 11 Apr 2015 • Yelong Shen, Ruoming Jin, Jianshu Chen, Xiaodong He, Jianfeng Gao, Li Deng
Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items.
no code implementations • 24 Feb 2015 • Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward
The results show that the proposed method in this paper significantly outperforms it for web document retrieval task.