1 code implementation • 18 Jan 2019 • Tianbing Xu, Andrew Zhang, Liang Zhao
There are two halves to RL systems: experience collection time and policy learning time.
no code implementations • ICML 2018 • Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng
The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.
no code implementations • 13 Mar 2018 • Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng
The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.
no code implementations • 21 Feb 2018 • Tianbing Xu
Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion.
no code implementations • 17 Oct 2017 • Tianbing Xu, Qiang Liu, Jian Peng
Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems.
no code implementations • 17 Oct 2013 • Tianbing Xu, Yaming Yu, John Turner, Amelia Regan
For the context bandit problems, Thompson Sampling is adopted based on the underlying posterior distributions of the parameters.
no code implementations • 17 Oct 2013 • Tianbing Xu, Jianfeng Gao, Lin Xiao, Amelia Regan
We propose a voted dual averaging method for online classification problems with explicit regularization.