no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Qingyu Guo, Xin Li, Zhirong Wang
First, since we concern the reward of a set of recommended items, we model the online recommendation as a contextual combinatorial bandit problem and define the reward of a recommended set.
no code implementations • 21 Aug 2020 • Xu He, Bo An, Yanghua Li, Haikai Chen, Rundong Wang, Xinrun Wang, Runsheng Yu, Xin Li, Zhirong Wang
Thus, the global policy of the whole page could be sub-optimal.
Multi-agent Reinforcement Learning Reinforcement Learning (RL)
no code implementations • 7 Jun 2020 • Yue Xu, Hao Chen, Zengde Deng, Junxiong Zhu, Yanghua Li, Peng He, Wenyao Gao, Wenjun Xu
The results verify that the proposed model outperforms existing GCN models considerably and yields up to a few orders of magnitude speedup in training, in terms of the recommendation performance.
no code implementations • 29 Feb 2020 • Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, Xin Li
A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users.