no code implementations • 2 Jul 2018 • Hua-Lin He, Chun-Xiang Pan, Qing Da, An-Xiang Zeng
In a large E-commerce platform, all the participants compete for impressions under the allocation mechanism of the platform.
no code implementations • 18 Nov 2018 • Feiyang Pan, Qingpeng Cai, An-Xiang Zeng, Chun-Xiang Pan, Qing Da, Hua-Lin He, Qing He, Pingzhong Tang
Model-free reinforcement learning methods such as the Proximal Policy Optimization algorithm (PPO) have successfully applied in complex decision-making problems such as Atari games.