Biasing Approximate Dynamic Programming with a Lower Discount Factor

NeurIPS 2008  ·  Marek Petrik, Bruno Scherrer ·

Most algorithms for solving Markov decision processes rely on a discount factor, which ensures their convergence. In fact, it is often used in problems with is no intrinsic motivation. In this paper, we show that when used in approximate dynamic programming, an artificially low discount factor may significantly improve the performance on some problems, such as Tetris. We propose two explanations for this phenomenon. Our first justification follows directly from the standard approximation error bounds: using a lower discount factor may decrease the approximation error bounds. However, we also show that these bounds are loose, a thus their decrease does not entirely justify a better practical performance. We thus propose another justification: when the rewards are received only sporadically (as it is the case in Tetris), we can derive tighter bounds, which support a significant performance increase with a decrease in the discount factor.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here