PAC-Bayesian Randomized Value Function with Informative Prior

1 Jan 2021  ·  Yuankun Jiang, Chenglin Li, Junni Zou, Wenrui Dai, Hongkai Xiong ·

Randomized value function has been shown as an effective exploration strategy for reinforcement learning (RL), which samples from a learned estimation of the distribution over the randomized Q-value function and then selects the optimal action. However, value function methods are known to suffer from value estimation error. Overfitting of value function is one of the main reasons to estimation error. To address this, in this paper, we propose a Bayesian linear regression with informative prior (IP-BLR) operator to leverage the data-dependent prior in the learning process of randomized value function, which can leverage the statistics of training results from previous iterations. We theoretically derive a generalization error bound for the proposed IP-BLR operation at each learning iteration based on PAC-Bayesian theory, showing a trade-off between the distribution obtained by IP-BLR and the informative prior. Since the optimal posterior that minimizes this generalization error bound is intractable, we alternatively develop an adaptive noise parameter update algorithm to balance this trade-off. The performance of the proposed IP-BLR deep Q-network (DQN) with adaptive noise parameter update is validated through some classical control tasks. It demonstrates that compared to existing methods using non-informative prior, the proposed IP-BLR DQN can achieve higher accumulated rewards in fewer interactions with the environment, due to the capabilities of more accurate value function approximation and better generalization.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods