no code implementations • Findings (ACL) 2022 • Zhuocheng Gong, Di He, Yelong Shen, Tie-Yan Liu, Weizhu Chen, Dongyan Zhao, Ji-Rong Wen, Rui Yan
Empirically, we show that (a) the dominant winning ticket can achieve performance that is comparable with that of the full-parameter model, (b) the dominant winning ticket is transferable across different tasks, (c) and the dominant winning ticket has a natural structure within each parameter matrix.
no code implementations • 23 Jan 2025 • Zhenghao Lin, Zihao Tang, Xiao Liu, Yeyun Gong, Yi Cheng, Qi Chen, Hang Li, Ying Xin, Ziyue Yang, Kailai Yang, Yu Yan, Xiao Liang, Shuai Lu, Yiming Huang, Zheheng Luo, Lei Qu, Xuan Feng, Yaoxiang Wang, Yuqing Xia, Feiyang Chen, Yuting Jiang, Yasen Hu, Hao Ni, Binyang Li, Guoshuai Zhao, Jui-Hao Chiang, Zhongxin Guo, Chen Lin, Kun Kuang, Wenjie Li, Yelong Shen, Jian Jiao, Peng Cheng, Mao Yang
We introduce Sigma, an efficient large language model specialized for the system domain, empowered by a novel architecture including DiffQKV attention, and pre-trained on our meticulously collected system domain data.
no code implementations • 17 Dec 2024 • Yiping Wang, Xuehai He, Kuan Wang, Luyao Ma, Jianwei Yang, Shuohang Wang, Simon Shaolei Du, Yelong Shen
However, they still struggle to coherently present multiple sequential events in the stories specified by the prompts, which is foreseeable an essential capability for future long video generation scenarios.
no code implementations • 12 Dec 2024 • Xuehai He, Shuohang Wang, Jianwei Yang, Xiaoxia Wu, Yiping Wang, Kuan Wang, Zheng Zhan, Olatunji Ruwase, Yelong Shen, Xin Eric Wang
Recent advancements in diffusion models have shown great promise in producing high-quality video content.
no code implementations • 14 Nov 2024 • Dilxat Muhtar, Yelong Shen, Yaming Yang, Xiaodong Liu, Yadong Lu, Jianfeng Liu, Yuefeng Zhan, Hao Sun, Weiwei Deng, Feng Sun, Xueliang Zhang, Jianfeng Gao, Weizhu Chen, Qi Zhang
The superior task adaptation and context encoding capabilities of StreamAdapter on both language understanding and generation tasks provides a new perspective for adapting LLMs at test time using context, allowing for more efficient adaptation across scenarios and more cost-effective inference
no code implementations • 14 Oct 2024 • Siru Ouyang, Shuohang Wang, Minhao Jiang, Ming Zhong, Donghan Yu, Jiawei Han, Yelong Shen
This paper delves into the effects of decoding temperatures on speculative decoding's efficacy.
no code implementations • 12 Oct 2024 • Yaming Yang, Dilxat Muhtar, Yelong Shen, Yuefeng Zhan, Jianfeng Liu, Yujing Wang, Hao Sun, Denvy Deng, Feng Sun, Qi Zhang, Weizhu Chen, Yunhai Tong
Parameter-efficient fine-tuning (PEFT) has been widely employed for domain adaptation, with LoRA being one of the most prominent methods due to its simplicity and effectiveness.
no code implementations • 4 Oct 2024 • Rongzhi Zhang, Kuang Wang, Liyuan Liu, Shuohang Wang, Hao Cheng, Chao Zhang, Yelong Shen
Existing approaches to mitigate this issue include: (1) efficient attention variants integrated in upcycling stages, which requires extensive parameter tuning thus unsuitable for pre-trained LLMs; (2) KV cache compression at test time, primarily through token eviction policies, which often overlook inter-layer dependencies and can be task-specific.
no code implementations • 18 Sep 2024 • Liyuan Liu, Young Jin Kim, Shuohang Wang, Chen Liang, Yelong Shen, Hao Cheng, Xiaodong Liu, Masahiro Tanaka, Xiaoxia Wu, Wenxiang Hu, Vishrav Chaudhary, Zeqi Lin, Chenruidong Zhang, Jilong Xue, Hany Awadalla, Jianfeng Gao, Weizhu Chen
Mixture-of-Experts (MoE) models scale more effectively than dense models due to sparse computation through expert routing, selectively activating only a small subset of expert modules.
no code implementations • 11 Sep 2024 • Khiem Ton, Nhi Nguyen, Mahmoud Nazzal, Abdallah Khreishah, Cristian Borcea, NhatHai Phan, Ruoming Jin, Issa Khalil, Yelong Shen
This paper introduces SGCode, a flexible prompt-optimizing system to generate secure code with large language models (LLMs).
no code implementations • 2 Jul 2024 • Yifang Chen, Shuohang Wang, ZiYi Yang, Hiteshi Sharma, Nikos Karampatziakis, Donghan Yu, Kevin Jamieson, Simon Shaolei Du, Yelong Shen
Reinforcement learning with human feedback (RLHF), as a widely adopted approach in current large language model pipelines, is \textit{bottlenecked by the size of human preference data}.
2 code implementations • 11 Jun 2024 • Liliang Ren, Yang Liu, Yadong Lu, Yelong Shen, Chen Liang, Weizhu Chen
When finetuned on 4K-length sequences, Samba efficiently extrapolates to a 256K context length with perfect memory recall on the Passkey Retrieval task, and exhibits superior retrieval extrapolation on the challenging Phonebook task compared to full-attention models.
no code implementations • 22 Apr 2024 • Marah Abdin, Jyoti Aneja, Hany Awadalla, Ahmed Awadallah, Ammar Ahmad Awan, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Qin Cai, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Weizhu Chen, Yen-Chun Chen, Yi-Ling Chen, Hao Cheng, Parul Chopra, Xiyang Dai, Matthew Dixon, Ronen Eldan, Victor Fragoso, Jianfeng Gao, Mei Gao, Min Gao, Amit Garg, Allie Del Giorno, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Wenxiang Hu, Jamie Huynh, Dan Iter, Sam Ade Jacobs, Mojan Javaheripi, Xin Jin, Nikos Karampatziakis, Piero Kauffmann, Mahoud Khademi, Dongwoo Kim, Young Jin Kim, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Yunsheng Li, Chen Liang, Lars Liden, Xihui Lin, Zeqi Lin, Ce Liu, Liyuan Liu, Mengchen Liu, Weishung Liu, Xiaodong Liu, Chong Luo, Piyush Madan, Ali Mahmoudzadeh, David Majercak, Matt Mazzola, Caio César Teodoro Mendes, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Liliang Ren, Gustavo de Rosa, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Yelong Shen, Swadheen Shukla, Xia Song, Masahiro Tanaka, Andrea Tupini, Praneetha Vaddamanu, Chunyu Wang, Guanhua Wang, Lijuan Wang, Shuohang Wang, Xin Wang, Yu Wang, Rachel Ward, Wen Wen, Philipp Witte, Haiping Wu, Xiaoxia Wu, Michael Wyatt, Bin Xiao, Can Xu, Jiahang Xu, Weijian Xu, Jilong Xue, Sonali Yadav, Fan Yang, Jianwei Yang, Yifan Yang, ZiYi Yang, Donghan Yu, Lu Yuan, Chenruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou
We introduce phi-3-mini, a 3. 8 billion parameter language model trained on 3. 3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3. 5 (e. g., phi-3-mini achieves 69% on MMLU and 8. 38 on MT-bench), despite being small enough to be deployed on a phone.
Ranked #5 on
MMR total
on MRR-Benchmark
(using extra training data)
3 code implementations • 11 Apr 2024 • Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, Weizhu Chen
Unlike traditional LMs that learn to predict every next token in a corpus, Rho-1 employs Selective Language Modeling (SLM), which selectively trains on useful tokens that aligned with the desired distribution.
no code implementations • 1 Apr 2024 • Xinzhe Ni, Yeyun Gong, Zhibin Gou, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
Additionally, we showcase the use of QaDS in creating efficient fine-tuning mixtures with various selection ratios, and analyze the quality of a wide range of open-source datasets, which can perform as a reference for future works on mathematical reasoning tasks.
1 code implementation • 18 Mar 2024 • Yi Luo, Zhenghao Lin, Yuhao Zhang, Jiashuo Sun, Chen Lin, Chengjin Xu, Xiangdong Su, Yelong Shen, Jian Guo, Yeyun Gong
Subsequently, the retrieval model correlates new inputs with relevant guidelines, which guide LLMs in response generation to ensure safe and high-quality outputs, thereby aligning with human values.
no code implementations • 4 Mar 2024 • Yiming Huang, Xiao Liu, Yeyun Gong, Zhibin Gou, Yelong Shen, Nan Duan, Weizhu Chen
Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets.
Ranked #51 on
Math Word Problem Solving
on MATH
no code implementations • 26 Feb 2024 • Ming Zhong, Yelong Shen, Shuohang Wang, Yadong Lu, Yizhu Jiao, Siru Ouyang, Donghan Yu, Jiawei Han, Weizhu Chen
Low-Rank Adaptation (LoRA) is extensively utilized in text-to-image models for the accurate rendition of specific elements like distinct characters or unique styles in generated images.
no code implementations • 4 Dec 2023 • Yiming Huang, Zhenghao Lin, Xiao Liu, Yeyun Gong, Shuai Lu, Fangyu Lei, Yaobo Liang, Yelong Shen, Chen Lin, Nan Duan, Weizhu Chen
Large language models (LLMs) have demonstrated impressive reasoning capabilities, yet there is ongoing debate about these abilities and the potential data contamination problem recently.
no code implementations • 10 Nov 2023 • Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen
Despite their impressive performance, any parsing errors will inevitably result in the failure of the execution of the external logical solver and no answer to the logical questions.
no code implementations • 1 Oct 2023 • Kuan Wang, Yadong Lu, Michael Santacroce, Yeyun Gong, Chao Zhang, Yelong Shen
To optimize agent interactions for task-specific learning with our universal buffer and pipeline, we introduce diverse communication patterns tailored for both single-agent and multi-agent environments.
1 code implementation • 29 Sep 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, Weizhu Chen
Large language models have made significant progress in various language tasks, yet they still struggle with complex mathematics.
Ranked #27 on
Arithmetic Reasoning
on GSM8K
(using extra training data)
1 code implementation • 18 Sep 2023 • Yadong Lu, Chunyuan Li, Haotian Liu, Jianwei Yang, Jianfeng Gao, Yelong Shen
We find that scaling LMM consistently enhances model performance and improves language capabilities, and performance of LoRA/QLoRA tuning of LMM are comparable to the performance of full-model fine-tuning.
Ranked #141 on
Visual Question Answering
on MM-Vet
no code implementations • 1 Sep 2023 • Michael Santacroce, Yadong Lu, Han Yu, Yuanzhi Li, Yelong Shen
To address this issue, we present a comprehensive analysis the memory usage, performance, and training time of memory-savings techniques for PPO.
no code implementations • 24 May 2023 • Woojeong Jin, Subhabrata Mukherjee, Yu Cheng, Yelong Shen, Weizhu Chen, Ahmed Hassan Awadallah, Damien Jose, Xiang Ren
Generalization to unseen tasks is an important ability for few-shot learners to achieve better zero-/few-shot performance on diverse tasks.
1 code implementation • 24 May 2023 • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen
In this paper, we show that strong performance can be achieved by a method we call Iter-RetGen, which synergizes retrieval and generation in an iterative manner.
1 code implementation • 19 May 2023 • Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Nan Duan, Weizhu Chen
Unlike these models, humans typically utilize external tools to cross-check and refine their initial content, like using a search engine for fact-checking, or a code interpreter for debugging.
2 code implementations • NeurIPS 2023 • Tong Wu, Zhihao Fan, Xiao Liu, Yeyun Gong, Yelong Shen, Jian Jiao, Hai-Tao Zheng, Juntao Li, Zhongyu Wei, Jian Guo, Nan Duan, Weizhu Chen
Diffusion models have gained significant attention in the realm of image generation due to their exceptional performance.
1 code implementation • NeurIPS 2023 • Zhendong Wang, Yifan Jiang, Yadong Lu, Yelong Shen, Pengcheng He, Weizhu Chen, Zhangyang Wang, Mingyuan Zhou
We present Prompt Diffusion, a framework for enabling in-context learning in diffusion-based generative models.
1 code implementation • 23 Apr 2023 • Jiashuo Sun, Yi Luo, Yeyun Gong, Chen Lin, Yelong Shen, Jian Guo, Nan Duan
By utilizing iterative bootstrapping, our approach enables LLMs to autonomously rectify errors, resulting in more precise and comprehensive reasoning chains.
1 code implementation • 7 Feb 2023 • Michael Santacroce, Zixin Wen, Yelong Shen, Yuanzhi Li
Auto-regressive large language models such as GPT-3 require enormous computational resources to use.
no code implementations • 1 Feb 2023 • Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, Weizhu Chen
However, the quality of the prompts depends on the demonstrations given to the models, and creating many of them by hand is costly.
1 code implementation • 22 Dec 2022 • Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, Weizhu Chen
In this paper, we introduce a novel dIffusion language modEl pre-training framework for text generation, which we call GENIE.
no code implementations • 20 Dec 2022 • Dong Li, Yelong Shen, Ruoming Jin, Yi Mao, Kuan Wang, Weizhu Chen
Pre-trained language models have achieved promising success in code retrieval tasks, where a natural language documentation query is given to find the most relevant existing code snippet.
2 code implementations • 18 Nov 2022 • Biyang Guo, Yeyun Gong, Yelong Shen, Songqiao Han, Hailiang Huang, Nan Duan, Weizhu Chen
We introduce GENIUS: a conditional text generation model using sketches as input, which can fill in the missing contexts for a given sketch (key information consisting of textual spans, phrases, or words, concatenated by mask tokens).
1 code implementation • 21 Oct 2022 • Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen
Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.
1 code implementation • 18 Oct 2022 • Xiaonan Li, Daya Guo, Yeyun Gong, Yun Lin, Yelong Shen, Xipeng Qiu, Daxin Jiang, Weizhu Chen, Nan Duan
In this paper, we present \textbf{SCodeR}, a \textbf{S}oft-labeled contrastive pre-training framework with two positive sample construction methods to learn functional-level \textbf{Code} \textbf{R}epresentation.
no code implementations • 13 Oct 2022 • Shiyang Li, Jianshu Chen, Yelong Shen, Zhiyu Chen, Xinlu Zhang, Zekun Li, Hong Wang, Jing Qian, Baolin Peng, Yi Mao, Wenhu Chen, Xifeng Yan
Integrating free-text explanations to in-context learning of large language models (LLM) is shown to elicit strong reasoning capabilities along with reasonable explanations.
2 code implementations • 28 Jun 2022 • Weizhou Shen, Yeyun Gong, Yelong Shen, Song Wang, Xiaojun Quan, Nan Duan, Weizhu Chen
Generate-then-rank is a widely used mechanism for text generation, where a generator produces multiple text candidates and a ranker chooses the best one among the text candidates.
no code implementations • 23 May 2022 • Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan
To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.
1 code implementation • ACL 2022 • Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao
To retain ensemble benefits while maintaining a low memory cost, we propose a consistency-regularized ensemble learning approach based on perturbed models, named CAMERO.
no code implementations • Findings (ACL) 2022 • Jing Qian, Li Dong, Yelong Shen, Furu Wei, Weizhu Chen
We propose a novel supervised method and also an unsupervised method to train the prefixes for single-aspect control while the combination of these two methods can achieve multi-aspect control.
1 code implementation • 26 Jan 2022 • Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan
For bimodal contrastive learning, we leverage the documentation and in-line comments of code to build code-text pairs.
no code implementations • NAACL 2022 • Yu Li, Baolin Peng, Yelong Shen, Yi Mao, Lars Liden, Zhou Yu, Jianfeng Gao
To address these challenges, we present PLUG, a language model that homogenizes different knowledge sources to a unified knowledge representation for knowledge-grounded dialogue generation tasks.
1 code implementation • ACL 2022 • Woojeong Jin, Yu Cheng, Yelong Shen, Weizhu Chen, Xiang Ren
Large pre-trained vision-language (VL) models can learn a new task with a handful of examples and generalize to a new task without fine-tuning.
Ranked #4 on
Image Captioning
on Flickr30k Captions test
(SPICE metric)
1 code implementation • ICLR 2022 • Hang Zhang, Yeyun Gong, Yelong Shen, Jiancheng Lv, Nan Duan, Weizhu Chen
To address these challenges, we present Adversarial Retriever-Ranker (AR2), which consists of a dual-encoder retriever plus a cross-encoder ranker.
68 code implementations • ICLR 2022 • Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen
We propose Low-Rank Adaptation, or LoRA, which freezes the pre-trained model weights and injects trainable rank decomposition matrices into each layer of the Transformer architecture, greatly reducing the number of trainable parameters for downstream tasks.
Ranked #2 on
Mathematical Reasoning
on AMC23
no code implementations • Findings (ACL) 2021 • Yuekai Zhao, Li Dong, Yelong Shen, Zhihua Zhang, Furu Wei, Weizhu Chen
To this end, we propose a multi-split reversible network and combine it with DARTS.
no code implementations • 10 May 2021 • Hang Zhang, Yeyun Gong, Yelong Shen, Weisheng Li, Jiancheng Lv, Nan Duan, Weizhu Chen
We first evaluate Poolingformer on two long sequence QA tasks: the monolingual NQ and the multilingual TyDi QA.
no code implementations • ACL 2021 • Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively.
Ranked #1 on
Open-Domain Question Answering
on TriviaQA
1 code implementation • 1 Jan 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
Current open-domain question answering systems often follow a Retriever-Reader architecture, where the retriever first retrieves relevant passages and the reader then reads the retrieved passages to form an answer.
no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih
We review the EfficientQA competition from NeurIPS 2020.
no code implementations • NeurIPS 2020 • Zijie Zhang, Zeru Zhang, Yang Zhou, Yelong Shen, Ruoming Jin, Dejing Dou
Despite achieving remarkable performance, deep graph learning models, such as node classification and network embedding, suffer from harassment caused by small adversarial perturbations.
no code implementations • ICLR 2021 • Yanru Qu, Dinghan Shen, Yelong Shen, Sandra Sajeev, Jiawei Han, Weizhu Chen
To verify the effectiveness of the proposed framework, we apply CoDA to Transformer-based models on a wide range of natural language understanding tasks.
no code implementations • 12 Oct 2020 • Mingzhi Zheng, Dinghan Shen, Yelong Shen, Weizhu Chen, Lin Xiao
We prove, from a theoretical perspective, that the gradients derived from this new masking schema have a smaller variance and can lead to more efficient self-supervised training.
Ranked #1 on
Sentence Classification
on ACL-ARC
2 code implementations • 29 Sep 2020 • Dinghan Shen, Mingzhi Zheng, Yelong Shen, Yanru Qu, Weizhu Chen
Adversarial training has been shown effective at endowing the learned representations with stronger generalization ability.
Ranked #9 on
Machine Translation
on IWSLT2014 German-English
1 code implementation • ACL 2021 • Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, Weizhu Chen
We demonstrate that the generated contexts substantially enrich the semantics of the queries and GAR with sparse representations (BM25) achieves comparable or better performance than state-of-the-art dense retrieval methods such as DPR.
Ranked #9 on
Passage Retrieval
on Natural Questions
1 code implementation • ACL 2020 • Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu
In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.
1 code implementation • ACL 2020 • Jie Lei, Li-Wei Wang, Yelong Shen, Dong Yu, Tamara L. Berg, Mohit Bansal
Generating multi-sentence descriptions for videos is one of the most challenging captioning tasks due to its high requirements for not only visual relevance but also discourse-based coherence across the sentences in the paragraph.
Ranked #5 on
Video Captioning
on ActivityNet Captions
1 code implementation • 19 Apr 2019 • Liu Yang, Junjie Hu, Minghui Qiu, Chen Qu, Jianfeng Gao, W. Bruce Croft, Xiaodong Liu, Yelong Shen, Jingjing Liu
In this paper, we propose a hybrid neural conversation model that combines the merits of both response retrieval and generation methods.
no code implementations • NAACL 2019 • Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang
Commonsense reasoning is fundamental to natural language understanding.
Ranked #2 on
Natural Language Understanding
on PDP60
1 code implementation • CVPR 2019 • Yitong Li, Zhe Gan, Yelong Shen, Jingjing Liu, Yu Cheng, Yuexin Wu, Lawrence Carin, David Carlson, Jianfeng Gao
We therefore propose a new story-to-image-sequence generation model, StoryGAN, based on the sequential conditional GAN framework.
5 code implementations • NAACL 2019 • Yichong Xu, Xiaodong Liu, Yelong Shen, Jingjing Liu, Jianfeng Gao
We propose a multi-task learning framework to learn a joint Machine Reading Comprehension (MRC) model that can be applied to a wide range of MRC tasks in different domains.
no code implementations • NeurIPS 2018 • Yelong Shen, Jianshu Chen, Po-Sen Huang, Yuqing Guo, Jianfeng Gao
In order to effectively train the agent from sparse rewards, we combine MCTS with the neural policy to generate trajectories yielding more positive rewards.
Ranked #45 on
Link Prediction
on WN18RR
(Hits@3 metric)
6 code implementations • ACL 2018 • Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao
We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension.
Ranked #24 on
Question Answering
on SQuAD1.1 dev
3 code implementations • ICLR 2018 • Hsin-Yuan Huang, Chenguang Zhu, Yelong Shen, Weizhu Chen
This paper introduces a new neural structure called FusionNet, which extends existing attention approaches from three perspectives.
Ranked #26 on
Question Answering
on SQuAD1.1 dev
1 code implementation • CVPR 2018 • Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, Xiaodong Liu
First, we introduce a synthetic dataset, called CoSaL, to evaluate the end-to-end performance of our LBIE system.
no code implementations • 14 Nov 2017 • Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu
This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC).
no code implementations • IJCNLP 2017 • Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao
Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.
no code implementations • WS 2017 • Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
However, due to the size of knowledge bases, learning multi-step relations directly on top of observed triplets could be costly.
no code implementations • 14 Nov 2016 • Yelong Shen, Po-Sen Huang, Ming-Wei Chang, Jianfeng Gao
Since large knowledge bases are typically incomplete, missing facts need to be inferred from observed facts in a task called knowledge base completion.
no code implementations • 17 Sep 2016 • Yelong Shen, Po-Sen Huang, Jianfeng Gao, Weizhu Chen
Teaching a computer to read and answer general questions pertaining to a document is a challenging yet unsolved problem.
Ranked #7 on
Question Answering
on CNN / Daily Mail
1 code implementation • NeurIPS 2015 • Jianshu Chen, Ji He, Yelong Shen, Lin Xiao, Xiaodong He, Jianfeng Gao, Xinying Song, Li Deng
We develop a fully discriminative learning approach for supervised Latent Dirichlet Allocation (LDA) model using Back Propagation (i. e., BP-sLDA), which maximizes the posterior probability of the prediction variable given the input document.
no code implementations • 11 Apr 2015 • Yelong Shen, Ruoming Jin, Jianshu Chen, Xiaodong He, Jianfeng Gao, Li Deng
Co-occurrence Data is a common and important information source in many areas, such as the word co-occurrence in the sentences, friends co-occurrence in social networks and products co-occurrence in commercial transaction data, etc, which contains rich correlation and clustering information about the items.
no code implementations • 24 Feb 2015 • Hamid Palangi, Li Deng, Yelong Shen, Jianfeng Gao, Xiaodong He, Jianshu Chen, Xinying Song, Rabab Ward
The results show that the proposed method in this paper significantly outperforms it for web document retrieval task.