Order-Optimal Global Convergence for Average Reward Reinforcement Learning via Actor-Critic Approach

26 Jul 2024  ·  Swetha Ganesh, Washim Uddin Mondal, Vaneet Aggarwal ·

This work analyzes average-reward reinforcement learning with general parametrization. Current state-of-the-art (SOTA) guarantees for this problem are either suboptimal or demand prior knowledge of the mixing time of the underlying Markov process, which is unavailable in most practical scenarios. We introduce a Multi-level Monte Carlo-based Natural Actor-Critic (MLMC-NAC) algorithm to address these issues. Our approach is the first to achieve a global convergence rate of $\tilde{\mathcal{O}}(1/\sqrt{T})$ without needing the knowledge of mixing time. It significantly surpasses the SOTA bound of $\tilde{\mathcal{O}}(T^{-1/4})$ where $T$ is the horizon length.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here