no code implementations • 6 Oct 2023 • Zhenghai Xue, Qingpeng Cai, Tianyou Zuo, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An
One challenge in large-scale online recommendation systems is the constant and complicated changes in users' behavior patterns, such as interaction rates and retention tendencies.
1 code implementation • NeurIPS 2023 • Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, Kun Gai
For each task, KuaiSim also provides evaluation protocols and baseline recommendation algorithms that further serve as benchmarks for future research.
no code implementations • 11 Aug 2023 • Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, Kun Gai, Fei Sun
For response generation, we utilize the generation ability of LLM as a language interface to better interact with users.
1 code implementation • 4 Jun 2023 • Shuchang Liu, Qingpeng Cai, Zhankui He, Bowen Sun, Julian McAuley, Dong Zheng, Peng Jiang, Kun Gai
In this work, we aim to learn a policy that can generate sufficiently diverse item lists for users while maintaining high recommendation quality.
1 code implementation • 7 Feb 2023 • Ziru Liu, Jiejie Tian, Qingpeng Cai, Xiangyu Zhao, Jingtong Gao, Shuchang Liu, Dayou Chen, Tonghao He, Dong Zheng, Peng Jiang, Kun Gai
To be specific, the RMTL structure can address the two aforementioned issues by (i) constructing an MTL environment from session-wise interactions and (ii) training multi-task actor-critic network structure, which is compatible with most existing MTL-based recommendation models, and (iii) optimizing and fine-tuning the MTL loss function using the weights generated by critic networks.
1 code implementation • 7 Feb 2023 • Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Kun Gai, Peng Jiang, Xiangyu Zhao, Yongfeng Zhang
To overcome this challenge, we propose a hyper-actor and critic learning framework where the policy decomposes the item list generation process into a hyper-action inference step and an effect-action selection step.
no code implementations • 3 Feb 2023 • Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang, Kun Gai
In this paper, we choose reinforcement learning methods to optimize the retention as they are designed to maximize the long-term performance.
no code implementations • 3 Feb 2023 • Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang, Kun Gai
One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning.
no code implementations • 6 Dec 2022 • Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Though promising, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult.
no code implementations • 1 Jun 2022 • Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An
Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation.
no code implementations • 26 May 2022 • Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, Pinghua Gong, Dong Zheng, Peng Jiang
In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos.
1 code implementation • CVPR 2022 • Wenqiao Zhang, Lei Zhu, James Hallinan, Andrew Makmur, Shengyu Zhang, Qingpeng Cai, Beng Chin Ooi
In this paper, we propose a novel semi-supervised learning (SSL) framework named BoostMIS that combines adaptive pseudo labeling and informative active annotation to unleash the potential of medical image SSL models: (1) BoostMIS can adaptively leverage the cluster assumption and consistency regularization of the unlabeled data according to the current learning status.
no code implementations • 13 Dec 2021 • Wenqiao Zhang, Haochen Shi, Jiannan Guo, Shengyu Zhang, Qingpeng Cai, Juncheng Li, Sihui Luo, Yueting Zhuang
We propose the Multimodal relAtional Graph adversarIal inferenCe (MAGIC) framework for diverse and unpaired TextCap.
no code implementations • 10 Aug 2021 • Qingpeng Cai, Can Cui, Yiyuan Xiong, Wei Wang, Zhongle Xie, Meihui Zhang
Data processing and analytics are fundamental and pervasive.
1 code implementation • NeurIPS 2020 • Ling Pan, Qingpeng Cai, Longbo Huang
A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.
no code implementations • 25 May 2020 • Jianxiong Wei, An-Xiang Zeng, Yueqiu Wu, Peng Guo, Qingsong Hua, Qingpeng Cai
In this paper, we present a novel Generator and Critic slate re-ranking approach, where the Critic evaluates the slate and the Generator ranks the items by the reinforcement learning approach.
no code implementations • 11 Nov 2019 • Ling Pan, Qingpeng Cai, Longbo Huang
Recent years have witnessed a tremendous improvement of deep reinforcement learning.
no code implementations • 9 Sep 2019 • Qingpeng Cai, Ling Pan, Pingzhong Tang
Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias.
no code implementations • 16 Jun 2019 • Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei
In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO.
1 code implementation • 14 Mar 2019 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu
In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.
no code implementations • 18 Nov 2018 • Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hua-Lin He, Qing He, Pingzhong Tang
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games.
no code implementations • 27 Sep 2018 • Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu
We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.
no code implementations • 10 Jul 2018 • Qingpeng Cai, Ling Pan, Pingzhong Tang
Such a setting generalizes the widely-studied stochastic state transition setting, namely the setting of deterministic policy gradient (DPG).
no code implementations • 13 Feb 2018 • Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang
Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.
no code implementations • 12 Feb 2018 • Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He
We evaluate PGCR on toy datasets as well as a real-world dataset of personalized music recommendations.
no code implementations • 25 Aug 2017 • Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang
We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform.