no code implementations • Findings (EMNLP) 2021 • Yu Feng, Jing Zhang, Gaole He, Wayne Xin Zhao, Lemao Liu, Quan Liu, Cuiping Li, Hong Chen
Knowledge Base Question Answering (KBQA) is to answer natural language questions posed over knowledge bases (KBs).
no code implementations • 11 Feb 2025 • Zican Dong, Junyi Li, Jinhao Jiang, Mingyu Xu, Wayne Xin Zhao, Bingning Wang, WeiPeng Chen
To address these challenges, we propose Long Context Pre-training with Restoration Distillation (LongReD), a novel approach designed to mitigate short-text performance degradation through minimizing the distribution discrepancy between the extended and original models.
no code implementations • 7 Feb 2025 • Ruiyang Ren, Yuhao Wang, Junyi Li, Jinhao Jiang, Wayne Xin Zhao, Wenjie Wang, Tat-Seng Chua
We reformulate the task as a progressive information collection process with a knowledge memory and unite an adaptive checklist with multi-perspective reward modeling in MCTS.
no code implementations • 22 Jan 2025 • Zhen Tian, Wayne Xin Zhao, Ji-Rong Wen
In this paper, we propose a novel optimizer state compression algorithm, namely $\pi$-Quant, which leverages the properties of irrational numbers (e. g., $\pi$) for memory-efficient training.
2 code implementations • 3 Jan 2025 • Yifan Du, Zikang Liu, YiFan Li, Wayne Xin Zhao, Yuqi Huo, Bingning Wang, WeiPeng Chen, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen
Moreover, it seems that such textual reasoning data can be even more effective than visual reasoning data in eliciting the slow-thinking capacities of MLLMs.
1 code implementation • 2 Jan 2025 • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
Large language models (LLMs) demonstrate exceptional capabilities, yet still face the hallucination issue.
2 code implementations • 23 Dec 2024 • Yiwen Hu, Huatong Song, Jia Deng, Jiapeng Wang, Jie Chen, Kun Zhou, Yutao Zhu, Jinhao Jiang, Zican Dong, Wayne Xin Zhao, Ji-Rong Wen
Effective pre-training of large language models (LLMs) has been challenging due to the immense resource demands and the complexity of the technical processes involved.
no code implementations • 17 Dec 2024 • Jinhao Jiang, Jiayi Chen, Junyi Li, Ruiyang Ren, Shijie Wang, Wayne Xin Zhao, Yang song, Tao Zhang
Existing large language models (LLMs) show exceptional problem-solving capabilities but might struggle with complex reasoning tasks.
3 code implementations • 12 Dec 2024 • Yingqian Min, Zhipeng Chen, Jinhao Jiang, Jie Chen, Jia Deng, Yiwen Hu, Yiru Tang, Jiapeng Wang, Xiaoxue Cheng, Huatong Song, Wayne Xin Zhao, Zheng Liu, Zhongyuan Wang, Ji-Rong Wen
We introduce an ``imitate, explore, and self-improve'' framework, denoted as \textbf{STILL-2}, as our primary technical approach to train the reasoning model.
no code implementations • 29 Nov 2024 • Daixuan Cheng, Shaohan Huang, Ziyu Zhu, Xintong Zhang, Wayne Xin Zhao, Zhongzhi Luan, Bo Dai, Zhenliang Zhang
This paper systematically investigates domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation.
2 code implementations • 18 Nov 2024 • Jinhao Jiang, Zhipeng Chen, Yingqian Min, Jie Chen, Xiaoxue Cheng, Jiapeng Wang, Yiru Tang, Haoxiang Sun, Jia Deng, Wayne Xin Zhao, Zheng Liu, Dong Yan, Jian Xie, Zhongyuan Wang, Ji-Rong Wen
This framework is implemented by integrating the policy model, reward model, and search algorithm.
no code implementations • 7 Nov 2024 • Ruiyang Ren, Yuhao Wang, Kun Zhou, Wayne Xin Zhao, Wenjie Wang, Jing Liu, Ji-Rong Wen, Tat-Seng Chua
Large language models (LLMs), with advanced linguistic capabilities, have been employed in reranking tasks through a sequence-to-sequence approach.
1 code implementation • 26 Oct 2024 • Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Ji-Rong Wen
They assume that problems are from the same task and traverse them in a random order.
1 code implementation • 17 Oct 2024 • Yifan Du, Yuqi Huo, Kun Zhou, Zijia Zhao, Haoyu Lu, Han Huang, Wayne Xin Zhao, Bingning Wang, WeiPeng Chen, Ji-Rong Wen
Then, we explore the scaling effects in frame selection and token selection respectively, and fit the corresponding function curve by conducting extensive empirical experiments.
1 code implementation • 16 Oct 2024 • Jia Deng, Tianyi Tang, Yanbin Yin, Wenhao Yang, Wayne Xin Zhao, Ji-Rong Wen
Second, by leveraging PersonalityBench, we propose an efficient method for identifying personality-related neurons within LLMs by examining the opposite aspects of a given trait.
no code implementations • 10 Oct 2024 • Zhipeng Chen, Liang Song, Kun Zhou, Wayne Xin Zhao, Bingning Wang, WeiPeng Chen, Ji-Rong Wen
In the extraction stage, we firstly locate key neurons that are highly related to specific abilities, and then employ them to extract the transferable ability-specific weights.
no code implementations • 9 Sep 2024 • Enze Liu, Bowen Zheng, Cheng Ling, Lantao Hu, Han Li, Wayne Xin Zhao
In order to achieve mutual enhancement between the two components, we propose a recommendation-oriented alignment approach by devising two specific optimization objectives: sequence-item alignment and preference-semantic alignment.
1 code implementation • 9 Sep 2024 • Bowen Zheng, Junjie Zhang, Hongyu Lu, Yu Chen, Ming Chen, Wayne Xin Zhao, Ji-Rong Wen
Based on these discrete codes, we enhance the collaborative information of contrastive views by considering neighborhood structure and semantic relevance respectively.
1 code implementation • 19 Aug 2024 • Chen Yang, Sunhao Dai, Yupeng Hou, Wayne Xin Zhao, Jun Xu, Yang song, HengShu Zhu
By utilizing the potential outcome framework, we further develop a model-agnostic causal reciprocal recommendation method that considers the causal effects of recommendations.
no code implementations • 26 Jul 2024 • Jie Chen, Zhipeng Chen, Jiapeng Wang, Kun Zhou, Yutao Zhu, Jinhao Jiang, Yingqian Min, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ji-Rong Wen
To make the CPT approach more traceable, this paper presents a technical report for continually pre-training Llama-3 (8B), which significantly enhances the Chinese language ability and scientific reasoning ability of the backbone model.
no code implementations • 15 Jul 2024 • Jinhao Jiang, Junyi Li, Wayne Xin Zhao, Yang song, Tao Zhang, Ji-Rong Wen
However, this method may result in inefficient knowledge memorization due to a lack of awareness of knowledge utilization and imposes substantial demands on LLMs to simultaneously learn knowledge utilization and format alignment with limited training samples.
1 code implementation • 8 Jul 2024 • Tianyi Tang, Yiwen Hu, Bingqian Li, Wenyang Luo, Zijing Qin, Haoxiang Sun, Jiapeng Wang, Shiyi Xu, Xiaoxue Cheng, Geyang Guo, Han Peng, Bowen Zheng, Yiru Tang, Yingqian Min, Yushuo Chen, Jie Chen, Yuanqian Zhao, Luran Ding, Yuhao Wang, Zican Dong, Chunxuan Xia, Junyi Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
To facilitate the research on large language models (LLMs), this paper presents a comprehensive and unified library, LLMBox, to ease the development, use, and evaluation of LLMs.
1 code implementation • 28 Jun 2024 • Yutao Zhu, Kun Zhou, Kelong Mao, Wentong Chen, Yiding Sun, Zhipeng Chen, Qian Cao, Yihan Wu, Yushuo Chen, Feng Wang, Lei Zhang, Junyi Li, Xiaolei Wang, Lei Wang, Beichen Zhang, Zican Dong, Xiaoxue Cheng, Yuhan Chen, Xinyu Tang, Yupeng Hou, Qiangqiang Ren, Xincheng Pang, Shufang Xie, Wayne Xin Zhao, Zhicheng Dou, Jiaxin Mao, Yankai Lin, Ruihua Song, Jun Xu, Xu Chen, Rui Yan, Zhewei Wei, Di Hu, Wenbing Huang, Ze-Feng Gao, Yueguo Chen, Weizheng Lu, Ji-Rong Wen
This paper presents the development of YuLan, a series of open-source LLMs with $12$ billion parameters.
1 code implementation • 20 Jun 2024 • Yifan Du, Kun Zhou, Yuqi Huo, YiFan Li, Wayne Xin Zhao, Haoyu Lu, Zijia Zhao, Bingning Wang, WeiPeng Chen, Ji-Rong Wen
Leveraging an effective instruction synthesis method and an adaptive model architecture, VIM surpasses both state-of-the-art open-source models and GPT-4V on the Event-Bench.
1 code implementation • 20 Jun 2024 • Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Ji-Rong Wen
The emergence of in-context learning (ICL) is potentially attributed to two major abilities: task recognition (TR) for recognizing the task from demonstrations and utilizing pre-trained priors, and task learning (TL) for learning from demonstrations.
1 code implementation • 19 Jun 2024 • Xinming Hou, Mingming Yang, Wenxiang Jiao, Xing Wang, Zhaopeng Tu, Wayne Xin Zhao
Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct.
no code implementations • 18 Jun 2024 • Jie Chen, Yupeng Zhang, Bingning Wang, Wayne Xin Zhao, Ji-Rong Wen, WeiPeng Chen
Synthetic data has been proposed as a solution to address the issue of high-quality data scarcity in the training of large language models (LLMs).
1 code implementation • 18 Jun 2024 • Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen
Concretely, we first identify the neurons that are related to the human preference data by a gradient-based strategy, then identify the alignment-related key tokens by reward models for computing loss.
1 code implementation • 17 Jun 2024 • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Hongzhi Zhang, Fuzheng Zhang, Di Zhang, Kun Gai, Ji-Rong Wen
Hallucination detection is a challenging task for large language models (LLMs), and existing studies heavily rely on powerful closed-source LLMs such as GPT-4.
1 code implementation • 30 May 2024 • Jinxia Yang, Bing Su, Wayne Xin Zhao, Ji-Rong Wen
In this paper, we introduce the Med-ST framework for fine-grained spatial and temporal modeling to exploit information from multiple spatial views of chest radiographs and temporal historical records.
no code implementations • 28 May 2024 • Zican Dong, Junyi Li, Xin Men, Wayne Xin Zhao, Bingbing Wang, Zhen Tian, WeiPeng Chen, Ji-Rong Wen
Based on our findings, we design two training-free context window extension methods, positional vector replacement and attention window extension.
1 code implementation • 23 May 2024 • Kun Zhou, Beichen Zhang, Jiapeng Wang, Zhipeng Chen, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang, Ji-Rong Wen
We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3. 0 model, which only needs to invoke GPT-4 API 9. 3k times and pre-train on 4. 6B data.
1 code implementation • 21 May 2024 • Peiyu Liu, Ze-Feng Gao, Wayne Xin Zhao, Yipeng Ma, Tao Wang, Ji-Rong Wen
In this paper, we introduce \textbf{DecoQuant}, a novel data-free low-bit quantization technique based on tensor decomposition methods, to effectively compress KV cache.
1 code implementation • 17 Apr 2024 • Yushuo Chen, Tianyi Tang, Erge Xiang, Linjiang Li, Wayne Xin Zhao, Jing Wang, Yunpeng Chai, Ji-Rong Wen
In real world, large language models (LLMs) can serve as the assistant to help users accomplish their jobs, and also support the development of advanced applications.
1 code implementation • 26 Mar 2024 • Zhen Tian, Wayne Xin Zhao, Changwang Zhang, Xin Zhao, Zhongrui Ma, Ji-Rong Wen
The core of transformer architecture lies in the self-attention mechanism, which computes the pairwise attention scores in a sequence.
1 code implementation • 21 Mar 2024 • Xiaoxue Cheng, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
In response to this challenge, we present an empirical investigation of CoT prompting and introduce CoTGenius, a novel framework designed for the automatic generation of superior CoT prompts.
1 code implementation • 20 Mar 2024 • Bowen Zheng, Zihan Lin, Enze Liu, Chen Yang, Enyang Bai, Cheng Ling, Wayne Xin Zhao, Ji-Rong Wen
Meanwhile, we leverage the LLM recommender as a supplemental component (discarded in deployment) to better capture underlying user preferences from heterogeneous interaction behaviors.
no code implementations • 14 Mar 2024 • Zikang Liu, Kun Zhou, Wayne Xin Zhao, Dawei Gao, Yaliang Li, Ji-Rong Wen
To investigate this issue, we conduct a series of empirical studies, which reveal a significant redundancy within the visual instruction datasets, and show that greatly reducing the amount of instructions from several tasks even do not affect the performance.
2 code implementations • 14 Mar 2024 • YiFan Li, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
In this paper, we study the harmlessness alignment problem of multimodal large language models (MLLMs).
no code implementations • 7 Mar 2024 • Wenjie Wang, Yang Zhang, Xinyu Lin, Fuli Feng, Weiwen Liu, Yong liu, Xiangyu Zhao, Wayne Xin Zhao, Yang song, Xiangnan He
The rise of generative models has driven significant advancements in recommender systems, leaving unique opportunities for enhancing users' personalized recommendations.
1 code implementation • 28 Feb 2024 • Lanling Xu, Zhen Tian, Bingqian Li, Junjie Zhang, Jinpeng Wang, Mingchen Cai, Wayne Xin Zhao
The core idea of our approach is to conduct a sequence-level semantic fusion approach by better integrating global contexts.
1 code implementation • 27 Feb 2024 • Xinyu Tang, Xiaolei Wang, Wayne Xin Zhao, Siyuan Lu, Yaliang Li, Ji-Rong Wen
By systematically analyzing a rich set of improvement strategies on the two aspects, we further develop a capable Gradient-inspired LLM-based Prompt Optimizer called GPO.
no code implementations • 27 Feb 2024 • Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang
Due to the excellent capacities of large language models (LLMs), it becomes feasible to develop LLM-based agents for reliable user simulation.
1 code implementation • 27 Feb 2024 • Yuhao Wang, Ruiyang Ren, Junyi Li, Wayne Xin Zhao, Jing Liu, Ji-Rong Wen
By combining the improvements in both architecture and training, our proposed REAR can better utilize external knowledge by effectively perceiving the relevance of retrieved documents.
2 code implementations • 26 Feb 2024 • Yiding Sun, Feng Wang, Yutao Zhu, Wayne Xin Zhao, Jiaxin Mao
The ability of the foundation models heavily relies on large-scale, diverse, and high-quality pretraining data.
no code implementations • 17 Feb 2024 • Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, Yang song, Chen Zhu, HengShu Zhu, Ji-Rong Wen
To guarantee the effectiveness, we leverage program language to formulate the multi-hop reasoning process over the KG, and synthesize a code-based instruction dataset to fine-tune the base LLM.
1 code implementation • 11 Jan 2024 • Zhipeng Chen, Kun Zhou, Wayne Xin Zhao, Junchen Wan, Fuzheng Zhang, Di Zhang, Ji-Rong Wen
To address it, we propose a new RL method named RLMEC that incorporates a generative model as the reward model, which is trained by the erroneous solution rewriting task under the minimum editing constraint, and can produce token-level rewards for RL training.
no code implementations • 10 Jan 2024 • Lanling Xu, Junjie Zhang, Bingqian Li, Jinpeng Wang, Sheng Chen, Wayne Xin Zhao, Ji-Rong Wen
As for the use of LLMs as recommenders, we analyze the impact of public availability, tuning strategies, model architecture, parameter scale, and context length on recommendation results based on the classification of LLMs.
no code implementations • 7 Jan 2024 • Yingqian Min, Kun Zhou, Dawei Gao, Wayne Xin Zhao, He Hu, Yaliang Li
Recently, multi-task instruction tuning has been applied into sentence representation learning, which endows the capability of generating specific representations with the guidance of task instruction, exhibiting strong generalization ability on new tasks.
1 code implementation • 6 Jan 2024 • Junyi Li, Jie Chen, Ruiyang Ren, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
To tackle the LLM hallucination, three key questions should be well studied: how to detect hallucinations (detection), why do LLMs hallucinate (source), and what can be done to mitigate them (mitigation).
1 code implementation • 1 Jan 2024 • Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen
Pre-trained recommendation models (PRMs) have received increasing interest recently.
no code implementations • 30 Dec 2023 • Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, Yaliang Li, Ji-Rong Wen
To better perform reasoning on KG, recent work typically adopts a pre-trained language model~(PLM) to model the question, and a graph neural network~(GNN) based module to perform multi-hop reasoning on the KG.
1 code implementation • 27 Nov 2023 • Zhen Tian, Changwang Zhang, Wayne Xin Zhao, Xin Zhao, Ji-Rong Wen, Zhao Cao
To address the above issue, we propose the Universal Feature Interaction Network (UFIN) approach for CTR prediction.
no code implementations • 19 Nov 2023 • Gaowei Zhang, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ji-Rong Wen
We find that scaling up the model size can greatly boost the performance on these challenging tasks, which again verifies the benefits of large recommendation models.
1 code implementation • 15 Nov 2023 • Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, Ji-Rong Wen
To address this challenge, in this paper, we propose a new LLM-based recommendation model called LC-Rec, which can better integrate language and collaborative semantics for recommender systems.
1 code implementation • 7 Nov 2023 • Geyang Guo, Ranchi Zhao, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen
Alignment with human preference is a desired property of large language models (LLMs).
no code implementations • 3 Nov 2023 • Kun Zhou, Yutao Zhu, Zhipeng Chen, Wentong Chen, Wayne Xin Zhao, Xu Chen, Yankai Lin, Ji-Rong Wen, Jiawei Han
Large language models~(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity.
no code implementations • 3 Nov 2023 • Wenqi Sun, Ruobing Xie, Shuqing Bian, Wayne Xin Zhao, Jie zhou
There is a rapidly-growing research interest in modeling user preferences via pre-training multi-domain interactions for recommender systems.
1 code implementation • 2 Nov 2023 • Yifan Du, Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Jinpeng Wang, Chuyuan Wang, Mingchen Cai, Ruihua Song, Ji-Rong Wen
By conducting a comprehensive empirical study, we find that instructions focused on complex visual reasoning tasks are particularly effective in improving the performance of MLLMs on evaluation benchmarks.
no code implementations • 13 Oct 2023 • Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen
The optimized agents can also propagate their preferences to other agents in subsequent interactions, implicitly capturing the collaborative filtering idea.
no code implementations • 11 Oct 2023 • Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng Zhang, Di Zhang, Kun Gai
In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LLMs.
1 code implementation • 23 Sep 2023 • Zican Dong, Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
Recently, multiple studies have committed to extending the context length and enhancing the long text modeling capabilities of LLMs.
1 code implementation • 24 Aug 2023 • Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, Jingyuan Wang
The field of urban spatial-temporal prediction is advancing rapidly with the development of deep learning techniques and the availability of large-scale datasets.
2 code implementations • 22 Aug 2023 • Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, ZhiYuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, Ji-Rong Wen
In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of LLM-based autonomous agents from a holistic perspective.
1 code implementation • 1 Aug 2023 • Geyang Guo, Jiarong Yang, Fengyuan LU, Jiaxin Qin, Tianyi Tang, Wayne Xin Zhao
From an evaluation perspective, we build a benchmark to judge ancient Chinese translation quality in different scenarios and evaluate the ancient Chinese translation capacities of various existing models.
1 code implementation • 21 Jul 2023 • Zhipeng Zhao, Kun Zhou, Xiaolei Wang, Wayne Xin Zhao, Fan Pan, Zhao Cao, Ji-Rong Wen
Conversational recommender systems (CRS) aim to provide the recommendation service via natural language conversations.
1 code implementation • 20 Jul 2023 • Ruiyang Ren, Yuhao Wang, Yingqi Qu, Wayne Xin Zhao, Jing Liu, Hao Tian, Hua Wu, Ji-Rong Wen, Haifeng Wang
In this study, we present the first analysis on the factual knowledge boundaries of LLMs and how retrieval augmentation affects LLMs on open-domain question answering (QA), with a bunch of important findings.
1 code implementation • 16 Jul 2023 • Peiyu Liu, Zikang Liu, Ze-Feng Gao, Dawei Gao, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen
Different from previous studies focused on overall performance, this work aims to investigate the impact of quantization on \emph{emergent abilities}, which are important characteristics that distinguish LLMs from small language models.
1 code implementation • 26 Jun 2023 • Bowen Zheng, Yupeng Hou, Wayne Xin Zhao, Yang song, HengShu Zhu
Existing RRS models mainly capture static user preferences, which have neglected the evolving user tastes and the dynamic matching relation between the two parties.
no code implementations • 19 Jun 2023 • Wayne Xin Zhao, Kun Zhou, Beichen Zhang, Zheng Gong, Zhipeng Chen, Yuanhang Zhou, Ji-Rong Wen, Jing Sha, Shijin Wang, Cong Liu, Guoping Hu
Specially, we construct a Mixture-of-Experts~(MoE) architecture for modeling mathematical text, so as to capture the common mathematical knowledge across tasks.
1 code implementation • 5 Jun 2023 • Lei Wang, Jingsen Zhang, Hao Yang, ZhiYuan Chen, Jiakai Tang, Zeyu Zhang, Xu Chen, Yankai Lin, Ruihua Song, Wayne Xin Zhao, Jun Xu, Zhicheng Dou, Jun Wang, Ji-Rong Wen
Simulating high quality user behavior data has always been a fundamental problem in human-centered applications, where the major difficulty originates from the intricate mechanism of human decision process.
1 code implementation • 5 Jun 2023 • Xiaolei Wang, Kun Zhou, Xinyu Tang, Wayne Xin Zhao, Fan Pan, Zhao Cao, Ji-Rong Wen
To develop our approach, we characterize user preference and organize the conversation flow by the entities involved in the dialogue, and design a multi-stage recommendation dialogue simulator based on a conversation flow language model.
1 code implementation • NeurIPS 2023 • Beichen Zhang, Kun Zhou, Xilin Wei, Wayne Xin Zhao, Jing Sha, Shijin Wang, Ji-Rong Wen
Based on this finding, we propose a new approach that can deliberate the reasoning steps with tool interfaces, namely \textbf{DELI}.
1 code implementation • 26 May 2023 • Yifan Du, Junyi Li, Tianyi Tang, Wayne Xin Zhao, Ji-Rong Wen
In this paper, we propose a novel language model guided captioning approach, LAMOC, for knowledge-based visual question answering (VQA).
1 code implementation • 26 May 2023 • Tianyi Tang, Yushuo Chen, Yifan Du, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
People often imagine relevant scenes to aid in the writing process.
2 code implementations • 24 May 2023 • Tianyi Tang, Hongyuan Lu, Yuchen Eleanor Jiang, Haoyang Huang, Dongdong Zhang, Wayne Xin Zhao, Tom Kocmi, Furu Wei
Most research about natural language generation (NLG) relies on evaluation benchmarks with limited references for a sample, which may result in poor correlations with human judgements.
1 code implementation • 23 May 2023 • Zhipeng Chen, Kun Zhou, Beichen Zhang, Zheng Gong, Wayne Xin Zhao, Ji-Rong Wen
Although large language models (LLMs) have achieved excellent performance in a variety of evaluation benchmarks, they still struggle in complex reasoning tasks which require specific knowledge and multi-hop reasoning.
1 code implementation • 22 May 2023 • Xiaolei Wang, Xinyu Tang, Wayne Xin Zhao, Jingyuan Wang, Ji-Rong Wen
The recent success of large language models (LLMs) has shown great potential to develop more powerful conversational recommender systems (CRSs), which rely on natural language conversations to satisfy user needs.
3 code implementations • 19 May 2023 • Junyi Li, Xiaoxue Cheng, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
Large language models (LLMs), such as ChatGPT, are prone to generate hallucinations, i. e., content that conflicts with the source or cannot be verified by the factual knowledge.
1 code implementation • 18 May 2023 • Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jingyuan Wang, Jian-Yun Nie, Ji-Rong Wen
In order to further improve the capacity of LLMs for knowledge-intensive tasks, we consider augmenting LLMs with the large-scale web using search engine.
no code implementations • 18 May 2023 • Ruiyang Ren, Wayne Xin Zhao, Jing Liu, Hua Wu, Ji-Rong Wen, Haifeng Wang
Recently, model-based retrieval has emerged as a new paradigm in text retrieval that discards the index in the traditional retrieval model and instead memorizes the candidate corpora using model parameters.
6 code implementations • 17 May 2023 • YiFan Li, Yifan Du, Kun Zhou, Jinpeng Wang, Wayne Xin Zhao, Ji-Rong Wen
Despite the promising progress on LVLMs, we find that LVLMs suffer from the hallucination problem, i. e. they tend to generate objects that are inconsistent with the target images in the descriptions.
1 code implementation • 16 May 2023 • Jinhao Jiang, Kun Zhou, Zican Dong, Keming Ye, Wayne Xin Zhao, Ji-Rong Wen
Specially, we propose an \emph{invoking-linearization-generation} procedure to support LLMs in reasoning on the structured data with the help of the external interfaces.
2 code implementations • 15 May 2023 • Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, Wayne Xin Zhao
Recently, large language models (LLMs) (e. g., GPT-4) have demonstrated impressive general-purpose task-solving abilities, including the potential to approach recommendation tasks.
1 code implementation • 11 May 2023 • Haoyang Huang, Tianyi Tang, Dongdong Zhang, Wayne Xin Zhao, Ting Song, Yan Xia, Furu Wei
Large language models (LLMs) demonstrate impressive multilingual capability, but their performance varies substantially across different languages.
no code implementations • 11 May 2023 • Junjie Zhang, Ruobing Xie, Yupeng Hou, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen
Inspired by the recent progress on large language models (LLMs), we take a different approach to developing the recommendation models, considering recommendation as instruction following by LLMs.
no code implementations • 6 May 2023 • Kun Zhou, YiFan Li, Wayne Xin Zhao, Ji-Rong Wen
To solve it, we propose Diffusion-NAT, which introduces discrete diffusion models~(DDM) into NAR text-to-text generation and integrates BART to improve the performance.
1 code implementation • 4 May 2023 • Chenzhan Shang, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Jing Zhang
In our approach, we first employ the hypergraph structure to model users' historical dialogue sessions and form a session-based hypergraph, which captures coarse-grained, session-level relations.
2 code implementations • 27 Apr 2023 • Jiawei Jiang, Chengkai Han, Wenjun Jiang, Wayne Xin Zhao, Jingyuan Wang
As deep learning technology advances and more urban spatial-temporal data accumulates, an increasing number of deep learning models are being proposed to solve urban spatial-temporal prediction problems.
no code implementations • 25 Apr 2023 • Junyi Li, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
In this way, conditional text generation can be cast as a glyph image generation task, and it is then natural to apply continuous diffusion models to discrete texts.
2 code implementations • 21 Apr 2023 • Zhen Tian, Ting Bai, Wayne Xin Zhao, Ji-Rong Wen, Zhao Cao
EulerNet converts the exponential powers of feature interactions into simple linear combinations of the modulus and phase of the complex features, making it possible to adaptively learn the high-order feature interactions in an efficient way.
6 code implementations • 31 Mar 2023 • Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, YiFan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, Ji-Rong Wen
To discriminate the difference in parameter scale, the research community has coined the term large language models (LLM) for the PLMs of significant size.
no code implementations • 27 Mar 2023 • Peiyu Liu, Ze-Feng Gao, Yushuo Chen, Wayne Xin Zhao, Ji-Rong Wen
Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility.
1 code implementation • 12 Mar 2023 • YiFan Li, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
In this survey, we review the recent progress in diffusion models for NAR text generation.
no code implementations • 28 Feb 2023 • Zican Dong, Tianyi Tang, Lunyi Li, Wayne Xin Zhao
In this paper, we provide an overview of the recent advances on long texts modeling based on Transformer models.
no code implementations • 6 Feb 2023 • Shanlei Mu, Penghui Wei, Wayne Xin Zhao, Shaoguo Liu, Liang Wang, Bo Zheng
In this paper, we propose a Hybrid Contrastive Constrained approach (HC^2) for multi-scenario ad ranking.
1 code implementation • 19 Jan 2023 • Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, Jingyuan Wang
However, GNN-based models have three major limitations for traffic prediction: i) Most methods model spatial dependencies in a static manner, which limits the ability to learn dynamic urban traffic patterns; ii) Most methods only consider short-range spatial information and are unable to capture long-range spatial dependencies; iii) These methods ignore the fact that the propagation of traffic conditions between locations has a time delay in traffic systems.
Ranked #5 on
Traffic Prediction
on PeMSD4
no code implementations • 16 Jan 2023 • Wenjun Jiang, Wayne Xin Zhao, Jingyuan Wang, Jiawei Jiang
Simulating the human mobility and generating large-scale trajectories are of great use in many real-world applications, such as urban planning, epidemic spreading analysis, and geographic privacy protect.
1 code implementation • 14 Jan 2023 • Hongpeng Lin, Ludan Ruan, Wenke Xia, Peiyu Liu, Jingyuan Wen, Yixin Xu, Di Hu, Ruihua Song, Wayne Xin Zhao, Qin Jin, Zhiwu Lu
Experimental results indicate that the models incorporating large language models (LLM) can generate more diverse responses, while the model utilizing knowledge graphs to introduce external knowledge performs the best overall.
1 code implementation • 26 Dec 2022 • Tianyi Tang, Junyi Li, Zhipeng Chen, Yiwen Hu, Zhuohao Yu, Wenxun Dai, Zican Dong, Xiaoxue Cheng, Yuhao Wang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
To facilitate research on text generation, this paper presents a comprehensive and unified library, TextBox 2. 0, focusing on the use of pre-trained language models (PLMs).
Ranked #1 on
Abstractive Text Summarization
on CNN/Daily Mail
1 code implementation • 15 Dec 2022 • Hangyu Guo, Kun Zhou, Wayne Xin Zhao, Qinyu Zhang, Ji-Rong Wen
Although pre-trained language models~(PLMs) have shown impressive performance by text-only self-supervised training, they are found lack of visual semantics or commonsense.
1 code implementation • 15 Dec 2022 • Kun Zhou, Xiao Liu, Yeyun Gong, Wayne Xin Zhao, Daxin Jiang, Nan Duan, Ji-Rong Wen
Pre-trained Transformers (\eg BERT) have been commonly used in existing dense retrieval methods for parameter initialization, and recent studies are exploring more effective pre-training tasks for further improving the quality of dense vectors.
1 code implementation • 2 Dec 2022 • Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
Multi-hop Question Answering over Knowledge Graph~(KGQA) aims to find the answer entities that are multiple hops away from the topic entities mentioned in a natural language question on a large-scale Knowledge Graph (KG).
1 code implementation • 28 Nov 2022 • Lanling Xu, Zhen Tian, Gaowei Zhang, Lei Wang, Junjie Zhang, Bowen Zheng, YiFan Li, Yupeng Hou, Xingyu Pan, Yushuo Chen, Wayne Xin Zhao, Xu Chen, Ji-Rong Wen
In order to show the recent update in RecBole, we write this technical report to introduce our latest improvements on RecBole.
2 code implementations • 27 Nov 2022 • Wayne Xin Zhao, Jing Liu, Ruiyang Ren, Ji-Rong Wen
With powerful PLMs, we can effectively learn the representations of queries and texts in the latent representation space, and further construct the semantic matching function between the dense vectors for relevance modeling.
1 code implementation • 21 Nov 2022 • Zhen Tian, Ting Bai, Zibin Zhang, Zhiyuan Xu, Kangyi Lin, Ji-Rong Wen, Wayne Xin Zhao
Some recent knowledge distillation based methods transfer knowledge from complex teacher models to shallow student models for accelerating the online model inference.
1 code implementation • 24 Oct 2022 • Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
However, NAR models usually generate texts of lower quality due to the absence of token dependency in the output text.
1 code implementation • 22 Oct 2022 • Yupeng Hou, Zhankui He, Julian McAuley, Wayne Xin Zhao
Based on this representation scheme, we further propose an enhanced contrastive pre-training approach, using semi-synthetic and mixed-domain code representations as hard negatives.
1 code implementation • 21 Oct 2022 • Kun Zhou, Yeyun Gong, Xiao Liu, Wayne Xin Zhao, Yelong Shen, Anlei Dong, Jingwen Lu, Rangan Majumder, Ji-Rong Wen, Nan Duan, Weizhu Chen
Thus, we propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.
1 code implementation • 21 Oct 2022 • Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Ji-Rong Wen
To develop effective and efficient graph similarity learning (GSL) models, a series of data-driven neural algorithms have been proposed in recent years.
no code implementations • 29 Aug 2022 • Zihan Lin, Xuanhua Yang, Xiaoyu Peng, Wayne Xin Zhao, Shaoguo Liu, Liang Wang, Bo Zheng
For this purpose, we build a relatedness prediction network, so that it can predict the contrast strength for inter-task representations of an instance.
1 code implementation • 18 Aug 2022 • Chen Yang, Yupeng Hou, Yang song, Tao Zhang, Ji-Rong Wen, Wayne Xin Zhao
To model the two-way selection preference from the dual-perspective of job seekers and employers, we incorporate two different nodes for each candidate (or job) and characterize both successful matching and failed matching via a unified dual-perspective interaction graph.
4 code implementations • 24 Jun 2022 • Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
Motivated by the success of supervised pre-training, we propose Multi-task superVised Pre-training (MVP) for natural language generation.
1 code implementation • 19 Jun 2022 • Xiaolei Wang, Kun Zhou, Ji-Rong Wen, Wayne Xin Zhao
Our approach unifies the recommendation and conversation subtasks into the prompt learning paradigm, and utilizes knowledge-enhanced prompts based on a fixed pre-trained language model (PLM) to fulfill both subtasks in a unified approach.
Ranked #1 on
Text Generation
on ReDial
2 code implementations • 15 Jun 2022 • Wayne Xin Zhao, Yupeng Hou, Xingyu Pan, Chen Yang, Zeyu Zhang, Zihan Lin, Jingsen Zhang, Shuqing Bian, Jiakai Tang, Wenqi Sun, Yushuo Chen, Lanling Xu, Gaowei Zhang, Zhen Tian, Changxin Tian, Shanlei Mu, Xinyan Fan, Xu Chen, Ji-Rong Wen
In order to support the study of recent advances in recommender systems, this paper presents an extended recommendation library consisting of eight packages for up-to-date topics and architectures.
1 code implementation • 13 Jun 2022 • Wayne Xin Zhao, Kun Zhou, Zheng Gong, Beichen Zhang, Yuanhang Zhou, Jing Sha, Zhigang Chen, Shijin Wang, Cong Liu, Ji-Rong Wen
Considering the complex nature of mathematical texts, we design a novel curriculum pre-training approach for improving the learning of mathematical PLMs, consisting of both basic and advanced courses.
2 code implementations • 13 Jun 2022 • Yupeng Hou, Shanlei Mu, Wayne Xin Zhao, Yaliang Li, Bolin Ding, Ji-Rong Wen
In order to develop effective sequential recommenders, a series of sequence representation learning (SRL) methods are proposed to model historical user behaviors.
no code implementations • 10 Jun 2022 • Zihan Lin, Hui Wang, Jingshu Mao, Wayne Xin Zhao, Cheng Wang, Peng Jiang, Ji-Rong Wen
Relevant recommendation is a special recommendation scenario which provides relevant items when users express interests on one target item (e. g., click, like and purchase).
no code implementations • 6 Jun 2022 • Shanlei Mu, Yupeng Hou, Wayne Xin Zhao, Yaliang Li, Bolin Ding
Instead of explicitly learning representations for item IDs, IDA-SR directly learns item representations from rich text information.
no code implementations • 1 Jun 2022 • Lanling Xu, Jianxun Lian, Wayne Xin Zhao, Ming Gong, Linjun Shou, Daxin Jiang, Xing Xie, Ji-Rong Wen
The learn-to-compare paradigm of contrastive representation learning (CRL), which compares positive samples with negative ones for representation learning, has achieved great success in a wide range of domains, including natural language processing, computer vision, information retrieval and graph learning.
1 code implementation • 22 May 2022 • Xinyan Fan, Jianxun Lian, Wayne Xin Zhao, Zheng Liu, Chaozhuo Li, Xing Xie
We first extract distribution patterns from the item candidates.
1 code implementation • 4 May 2022 • Jinhao Jiang, Kun Zhou, Wayne Xin Zhao, Ji-Rong Wen
Commonsense reasoning in natural language is a desired ability of artificial intelligent systems.
1 code implementation • NAACL 2022 • Junyi Li, Tianyi Tang, Jian-Yun Nie, Ji-Rong Wen, Wayne Xin Zhao
First, PTG learns a set of source prompts for various source generation tasks and then transfers these prompts as target prompts to perform target generation tasks.
1 code implementation • NAACL 2022 • Junyi Li, Tianyi Tang, Zheng Gong, Lixin Yang, Zhuohao Yu, Zhipeng Chen, Jingyuan Wang, Wayne Xin Zhao, Ji-Rong Wen
In this paper, we present a large-scale empirical study on general language ability evaluation of PLMs (ElitePLM).
1 code implementation • ACL 2022 • Kun Zhou, Beichen Zhang, Wayne Xin Zhao, Ji-Rong Wen
In DCLR, we design an instance weighting method to punish false negatives and generate noise-based negatives to guarantee the uniformity of the representation space.
no code implementations • 27 Apr 2022 • Ruiyang Ren, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Qifei Wu, Yuchen Ding, Hua Wu, Haifeng Wang, Ji-Rong Wen
Recent years have witnessed the significant advance in dense retrieval (DR) based on powerful pre-trained language models (PLM).
1 code implementation • 23 Apr 2022 • Yupeng Hou, Binbin Hu, Zhiqiang Zhang, Wayne Xin Zhao
Session-based Recommendation (SBR) refers to the task of predicting the next item based on short-term user behaviors within an anonymous session.
no code implementations • 27 Mar 2022 • Yupeng Hou, Xingyu Pan, Wayne Xin Zhao, Shuqing Bian, Yang song, Tao Zhang, Ji-Rong Wen
As the core technique of online recruitment platforms, person-job fit can improve hiring efficiency by accurately matching job positions with qualified candidates.
no code implementations • 22 Mar 2022 • Sha Yuan, Shuai Zhao, Jiahong Leng, Zhao Xue, Hanyu Zhao, Peiyu Liu, Zheng Gong, Wayne Xin Zhao, Junyi Li, Jie Tang
The results show that WuDaoMM can be applied as an efficient dataset for VLPMs, especially for the model in text-to-image generation task.
1 code implementation • 3 Mar 2022 • Yupeng Hou, Binbin Hu, Wayne Xin Zhao, Zhiqiang Zhang, Jun Zhou, Ji-Rong Wen
In this way, we can learn adaptive representations for a given graph when paired with different graphs, and both node- and graph-level characteristics are naturally considered in a single pre-training task.
2 code implementations • COLING 2022 • Ze-Feng Gao, Peiyu Liu, Wayne Xin Zhao, Zhong-Yi Lu, Ji-Rong Wen
Recently, Mixture-of-Experts (short as MoE) architecture has achieved remarkable success in increasing the model capacity of large-scale language models.
2 code implementations • 28 Feb 2022 • Kun Zhou, Hui Yu, Wayne Xin Zhao, Ji-Rong Wen
Recently, deep neural networks such as RNN, CNN and Transformer have been applied in the task of sequential recommendation, which aims to capture the dynamic preference characteristics from logged user behavior data for accurate recommendation.
no code implementations • 18 Feb 2022 • Yifan Du, Zikang Liu, Junyi Li, Wayne Xin Zhao
In this paper, we review the recent progress in Vision-Language Pre-Trained Models (VL-PTMs).
1 code implementation • 13 Feb 2022 • Zihan Lin, Changxin Tian, Yupeng Hou, Wayne Xin Zhao
For the structural neighbors on the interaction graph, we develop a novel structure-contrastive objective that regards users (or items) and their structural neighbors as positive contrastive pairs.
1 code implementation • COLING 2022 • Tianyi Tang, Junyi Li, Wayne Xin Zhao, Ji-Rong Wen
Secondly, we use continuous inverse prompting to improve the process of natural language generation by modeling an inverse generation process from output to input, making the generated text more relevant to the inputs.
no code implementations • 14 Jan 2022 • Junyi Li, Tianyi Tang, Wayne Xin Zhao, Jian-Yun Nie, Ji-Rong Wen
We begin with introducing three key aspects of applying PLMs to text generation: 1) how to encode the input into representations preserving input semantics which can be fused into PLMs; 2) how to design an effective PLM to serve as the generation model; and 3) how to effectively optimize PLMs given the reference text and to ensure that the generated texts satisfy special text properties.
1 code implementation • 4 Jan 2022 • Yuanhang Zhou, Kun Zhou, Wayne Xin Zhao, Cheng Wang, Peng Jiang, He Hu
To implement this framework, we design both coarse-grained and fine-grained procedures for modeling user preference, where the former focuses on more general, coarse-grained semantic fusion and the latter focuses on more specific, fine-grained semantic fusion.
Ranked #2 on
Recommendation Systems
on ReDial