Search Results for author: Qingpeng Cai

Found 36 papers, 14 papers with code

LDACP: Long-Delayed Ad Conversions Prediction Model for Bidding Strategy

no code implementations25 Nov 2024 Peng Cui, Yiming Yang, Fusheng Jin, Siyuan Tang, Yunli Wang, Fukang Yang, Yalong Jia, Qingpeng Cai, Fei Pan, Changcheng Li, Peng Jiang

To alleviate the issue of discontinuity in one-hot hard labels, the Bucket Classification Module with label Smoothing method (BCMS) converts one-hot hard labels into non-normalized soft labels, then fits these soft labels by minimizing classification loss and regression loss.

regression

DLCRec: A Novel Approach for Managing Diversity in LLM-Based Recommender Systems

no code implementations22 Aug 2024 Jiaju Chen, Chongming Gao, Shuai Yuan, Shuchang Liu, Qingpeng Cai, Peng Jiang

These sub-tasks are trained independently and inferred sequentially according to user-defined control numbers, ensuring more precise control over diversity.

Data Augmentation Diversity +1

Rectifying Reinforcement Learning for Reward Matching

no code implementations4 Jun 2024 Haoran He, Emmanuel Bengio, Qingpeng Cai, Ling Pan

In this paper, we establish a new connection between GFlowNets and policy evaluation for a uniform policy.

Decision Making reinforcement-learning +3

Bifurcated Generative Flow Networks

no code implementations4 Jun 2024 Chunhui Li, Cheng-Hao Liu, Dianbo Liu, Qingpeng Cai, Ling Pan

Generative Flow Networks (GFlowNets), a new family of probabilistic samplers, have recently emerged as a promising framework for learning stochastic policies that generate high-quality and diverse objects proportionally to their rewards.

M3oE: Multi-Domain Multi-Task Mixture-of Experts Recommendation Framework

1 code implementation29 Apr 2024 Zijian Zhang, Shuchang Liu, Jiaao Yu, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Ziru Liu, Qidong Liu, Hongwei Zhao, Lantao Hu, Peng Jiang, Kun Gai

M3oE integrates multi-domain information, maps knowledge across domains and tasks, and optimizes multiple objectives.

AutoML

Sequential Recommendation for Optimizing Both Immediate Feedback and Long-term Retention

1 code implementation4 Apr 2024 Ziru Liu, Shuchang Liu, Zijian Zhang, Qingpeng Cai, Xiangyu Zhao, Kesen Zhao, Lantao Hu, Peng Jiang, Kun Gai

In the landscape of Recommender System (RS) applications, reinforcement learning (RL) has recently emerged as a powerful tool, primarily due to its proficiency in optimizing long-term rewards.

Contrastive Learning Multi-Task Learning +2

Future Impact Decomposition in Request-level Recommendations

1 code implementation29 Jan 2024 Xiaobei Wang, Shuchang Liu, Xueliang Wang, Qingpeng Cai, Lantao Hu, Han Li, Peng Jiang, Kun Gai, Guangming Xie

Furthermore, we show that a reward-based future decomposition strategy can better express the item-wise future impact and improve the recommendation accuracy in the long term.

Recommendation Systems

AdaRec: Adaptive Sequential Recommendation for Reinforcing Long-term User Engagement

no code implementations6 Oct 2023 Zhenghai Xue, Qingpeng Cai, Tianyou Zuo, Bin Yang, Lantao Hu, Peng Jiang, Kun Gai, Bo An

One challenge in large-scale online recommendation systems is the constant and complicated changes in users' behavior patterns, such as interaction rates and retention tendencies.

Reinforcement Learning (RL) Sequential Recommendation

KuaiSim: A Comprehensive Simulator for Recommender Systems

1 code implementation NeurIPS 2023 Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, Kun Gai

For each task, KuaiSim also provides evaluation protocols and baseline recommendation algorithms that further serve as benchmarks for future research.

Reinforcement Learning (RL) Sequential Recommendation

A Large Language Model Enhanced Conversational Recommender System

no code implementations11 Aug 2023 Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, Kun Gai, Fei Sun

For response generation, we utilize the generation ability of LLM as a language interface to better interact with users.

Language Modelling Large Language Model +2

Generative Flow Network for Listwise Recommendation

1 code implementation4 Jun 2023 Shuchang Liu, Qingpeng Cai, Zhankui He, Bowen Sun, Julian McAuley, Dong Zheng, Peng Jiang, Kun Gai

In this work, we aim to learn a policy that can generate sufficiently diverse item lists for users while maintaining high recommendation quality.

Diversity Recommendation Systems

Multi-Task Recommendations with Reinforcement Learning

1 code implementation7 Feb 2023 Ziru Liu, Jiejie Tian, Qingpeng Cai, Xiangyu Zhao, Jingtong Gao, Shuchang Liu, Dayou Chen, Tonghao He, Dong Zheng, Peng Jiang, Kun Gai

To be specific, the RMTL structure can address the two aforementioned issues by (i) constructing an MTL environment from session-wise interactions and (ii) training multi-task actor-critic network structure, which is compatible with most existing MTL-based recommendation models, and (iii) optimizing and fine-tuning the MTL loss function using the weights generated by critic networks.

Multi-Task Learning Recommendation Systems +3

Exploration and Regularization of the Latent Action Space in Recommendation

1 code implementation7 Feb 2023 Shuchang Liu, Qingpeng Cai, Bowen Sun, Yuhao Wang, Ji Jiang, Dong Zheng, Kun Gai, Peng Jiang, Xiangyu Zhao, Yongfeng Zhang

To overcome this challenge, we propose a hyper-actor and critic learning framework where the policy decomposes the item list generation process into a hyper-action inference step and an effect-action selection step.

Recommendation Systems

Two-Stage Constrained Actor-Critic for Short Video Recommendation

1 code implementation3 Feb 2023 Qingpeng Cai, Zhenghai Xue, Chi Zhang, Wanqi Xue, Shuchang Liu, Ruohan Zhan, Xueliang Wang, Tianyou Zuo, Wentao Xie, Dong Zheng, Peng Jiang, Kun Gai

One the one hand, the platforms aims at optimizing the users' cumulative watch time (main goal) in long term, which can be effectively optimized by Reinforcement Learning.

Recommendation Systems reinforcement-learning +2

Reinforcing User Retention in a Billion Scale Short Video Recommender System

no code implementations3 Feb 2023 Qingpeng Cai, Shuchang Liu, Xueliang Wang, Tianyou Zuo, Wentao Xie, Bin Yang, Dong Zheng, Peng Jiang, Kun Gai

In this paper, we choose reinforcement learning methods to optimize the retention as they are designed to maximize the long-term performance.

Recommendation Systems reinforcement-learning +2

PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement

1 code implementation6 Dec 2022 Wanqi Xue, Qingpeng Cai, Zhenghai Xue, Shuo Sun, Shuchang Liu, Dong Zheng, Peng Jiang, Kun Gai, Bo An

Though promising, the application of RL heavily relies on well-designed rewards, but designing rewards related to long-term user engagement is quite difficult.

Recommendation Systems Reinforcement Learning (RL)

ResAct: Reinforcing Long-term Engagement in Sequential Recommendation with Residual Actor

1 code implementation1 Jun 2022 Wanqi Xue, Qingpeng Cai, Ruohan Zhan, Dong Zheng, Peng Jiang, Kun Gai, Bo An

Meanwhile, reinforcement learning (RL) is widely regarded as a promising framework for optimizing long-term engagement in sequential recommendation.

Reinforcement Learning (RL) Sequential Recommendation

Constrained Reinforcement Learning for Short Video Recommendation

no code implementations26 May 2022 Qingpeng Cai, Ruohan Zhan, Chi Zhang, Jie Zheng, Guangwei Ding, Pinghua Gong, Dong Zheng, Peng Jiang

In this paper, we formulate the problem of short video recommendation as a constrained Markov Decision Process (MDP), where platforms want to optimize the main goal of user watch time in long term, with the constraint of accommodating the auxiliary responses of user interactions such as sharing/downloading videos.

Recommendation Systems reinforcement-learning +2

BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation

1 code implementation CVPR 2022 Wenqiao Zhang, Lei Zhu, James Hallinan, Andrew Makmur, Shengyu Zhang, Qingpeng Cai, Beng Chin Ooi

In this paper, we propose a novel semi-supervised learning (SSL) framework named BoostMIS that combines adaptive pseudo labeling and informative active annotation to unleash the potential of medical image SSL models: (1) BoostMIS can adaptively leverage the cluster assumption and consistency regularization of the unlabeled data according to the current learning status.

Active Learning

Softmax Deep Double Deterministic Policy Gradients

1 code implementation NeurIPS 2020 Ling Pan, Qingpeng Cai, Longbo Huang

A widely-used actor-critic reinforcement learning algorithm for continuous control, Deep Deterministic Policy Gradients (DDPG), suffers from the overestimation problem, which can negatively affect the performance.

continuous-control Continuous Control

Generator and Critic: A Deep Reinforcement Learning Approach for Slate Re-ranking in E-commerce

no code implementations25 May 2020 Jianxiong Wei, An-Xiang Zeng, Yueqiu Wu, Peng Guo, Qingsong Hua, Qingpeng Cai

In this paper, we present a novel Generator and Critic slate re-ranking approach, where the Critic evaluates the slate and the Generator ranks the items by the reinforcement learning approach.

Deep Reinforcement Learning Diversity +3

Multi-Path Policy Optimization

no code implementations11 Nov 2019 Ling Pan, Qingpeng Cai, Longbo Huang

Recent years have witnessed a tremendous improvement of deep reinforcement learning.

Deep Reinforcement Learning Efficient Exploration

Deterministic Value-Policy Gradients

no code implementations9 Sep 2019 Qingpeng Cai, Ling Pan, Pingzhong Tang

Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias.

continuous-control Continuous Control +3

Reinforcement Learning Driven Heuristic Optimization

no code implementations16 Jun 2019 Qingpeng Cai, Will Hang, Azalia Mirhoseini, George Tucker, Jingtao Wang, Wei Wei

In this paper, we introduce a novel framework to generate better initial solutions for heuristic algorithms using reinforcement learning (RL), named RLHO.

Combinatorial Optimization reinforcement-learning +2

Reinforcement Learning with Dynamic Boltzmann Softmax Updates

1 code implementation14 Mar 2019 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Longbo Huang, Tie-Yan Liu

In this paper, we propose to update the value function with dynamic Boltzmann softmax (DBS) operator, which has good convergence property in the setting of planning and learning.

Atari Games Q-Learning +3

Policy Optimization with Model-based Explorations

no code implementations18 Nov 2018 Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hua-Lin He, Qing He, Pingzhong Tang

Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games.

Atari Games Decision Making +4

A Convergent Variant of the Boltzmann Softmax Operator in Reinforcement Learning

no code implementations27 Sep 2018 Ling Pan, Qingpeng Cai, Qi Meng, Wei Chen, Tie-Yan Liu

We then propose the dynamic Boltzmann softmax(DBS) operator to enable the convergence to the optimal value function in value iteration.

Atari Games Q-Learning +3

Deterministic Policy Gradients With General State Transitions

no code implementations10 Jul 2018 Qingpeng Cai, Ling Pan, Pingzhong Tang

Such a setting generalizes the widely-studied stochastic state transition setting, namely the setting of deterministic policy gradient (DPG).

continuous-control Continuous Control +1

A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems

no code implementations13 Feb 2018 Ling Pan, Qingpeng Cai, Zhixuan Fang, Pingzhong Tang, Longbo Huang

Different from existing methods that often ignore spatial information and rely heavily on accurate prediction, HRP captures both spatial and temporal dependencies using a divide-and-conquer structure with an embedded localized module.

Deep Reinforcement Learning reinforcement-learning +1

Policy Gradients for Contextual Recommendations

no code implementations12 Feb 2018 Feiyang Pan, Qingpeng Cai, Pingzhong Tang, Fuzhen Zhuang, Qing He

We evaluate PGCR on toy datasets as well as a real-world dataset of personalized music recommendations.

Decision Making Multi-Armed Bandits +2

Reinforcement Mechanism Design for e-commerce

no code implementations25 Aug 2017 Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang

We study the problem of allocating impressions to sellers in e-commerce websites, such as Amazon, eBay or Taobao, aiming to maximize the total revenue generated by the platform.

Deep Reinforcement Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.