On the function approximation error for risk-sensitive reinforcement learning

22 Dec 2016  ·  Prasenjit Karmakar, Shalabh Bhatnagar ·

In this paper we obtain several informative error bounds on function approximation for the policy evaluation algorithm proposed by Basu et al. when the aim is to find the risk-sensitive cost represented using exponential utility. The main idea is to use classical Bapat's inequality and to use Perron-Frobenius eigenvectors (exists if we assume irreducible Markov chain) to get the new bounds. The novelty of our approach is that we use the irreduciblity of Markov chain to get the new bounds whereas the earlier work by Basu et al. used spectral variation bound which is true for any matrix. We also give examples where all our bounds achieve the "actual error" whereas the earlier bound given by Basu et al. is much weaker in comparison. We show that this happens due to the absence of difference term in the earlier bound which is always present in all our bounds when the state space is large. Additionally, we discuss how all our bounds compare with each other.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here