Convergent Reinforcement Learning with Function Approximation: A Bilevel Optimization Perspective

27 Sep 2018  ·  Zhuoran Yang, Zuyue Fu, Kaiqing Zhang, Zhaoran Wang ·

We study reinforcement learning algorithms with nonlinear function approximation in the online setting. By formulating both the problems of value function estimation and policy learning as bilevel optimization problems, we propose online Q-learning and actor-critic algorithms for these two problems respectively. Our algorithms are gradient-based methods and thus are computationally efficient. Moreover, by approximating the iterates using differential equations, we establish convergence guarantees for the proposed algorithms. Thorough numerical experiments are conducted to back up our theory.

PDF Abstract


  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.


No methods listed for this paper. Add relevant methods here