Search Results for author: Tianbing Xu

Found 7 papers, 1 papers with code

WALL-E: An Efficient Reinforcement Learning Research Framework

1 code implementation • 18 Jan 2019 • Tianbing Xu, Andrew Zhang, Liang Zhao

There are two halves to RL systems: experience collection time and policy learning time.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Learning to Explore via Meta-Policy Gradient

no code implementations • ICML 2018 • Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.

Continuous Control Q-Learning +2

Paper
Add Code

Learning to Explore with Meta-Policy Gradient

no code implementations • 13 Mar 2018 • Tianbing Xu, Qiang Liu, Liang Zhao, Jian Peng

The performance of off-policy learning, including deep Q-learning and deep deterministic policy gradient (DDPG), critically depends on the choice of the exploration policy.

Q-Learning Reinforcement Learning (RL)

Paper
Add Code

Variational Inference for Policy Gradient

no code implementations • 21 Feb 2018 • Tianbing Xu

Inspired by the seminal work on Stein Variational Inference and Stein Variational Policy Gradient, we derived a method to generate samples from the posterior variational parameter distribution by \textit{explicitly} minimizing the KL divergence to match the target distribution in an amortize fashion.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Stochastic Variance Reduction for Policy Gradient Estimation

no code implementations • 17 Oct 2017 • Tianbing Xu, Qiang Liu, Jian Peng

Recent advances in policy gradient methods and deep learning have demonstrated their applicability for complex reinforcement learning problems.

Continuous Control Policy Gradient Methods +2

Paper
Add Code

Thompson Sampling in Dynamic Systems for Contextual Bandit Problems

no code implementations • 17 Oct 2013 • Tianbing Xu, Yaming Yu, John Turner, Amelia Regan

For the context bandit problems, Thompson Sampling is adopted based on the underlying posterior distributions of the parameters.

Thompson Sampling

Paper
Add Code

Online Classification Using a Voted RDA Method

no code implementations • 17 Oct 2013 • Tianbing Xu, Jianfeng Gao, Lin Xiao, Amelia Regan

We propose a voted dual averaging method for online classification problems with explicit regularization.

Classification General Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.