Safe Coupled Deep Q-Learning for Recommendation Systems

8 Jan 2021  ·  Runsheng Yu, Yu Gong, Rundong Wang, Bo An, Qingwen Liu, Wenwu Ou ·

Reinforcement Learning (RL) is one of the prevailing approaches to optimize long-term user engagement in Recommendation Systems (RS). However, the well-known exploration strategies of RL (e.g., the $\epsilon$-greedy strategy) encourage agents to interact and explore the environment freely, which may recommend unpleasant items to the users frequently, violating their preferences and making them lose confidence in the RS platform. To avoid such irrelevant and unpleasant recommendations, we propose a novel safe RL approach to maximize accumulated long-term reward under the safety guarantee. Our contributions are three-fold. Firstly, we introduce a novel training scheme with two value functions to maximize the accumulated long-term reward under the safety constraint. Secondly, we theoretically show that our methods are able to converge and maintain safety with a high probability during the training process. Thirdly, we implement two practical methods, including a Simhash-based method as well as a relaxation method for large-scale environments. Experiments on immediate recommendation, sequential recommendations, as well as safe gridworld reveal that our methods outperform the state-of-the-arts dramatically.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here