Quantile Markov Decision Process

15 Nov 2017  ·  Xiaocheng Li, Huaiyang Zhong, Margaret L. Brandeau ·

The goal of a traditional Markov decision process (MDP) is to maximize expected cumulativereward over a defined horizon (possibly infinite). In many applications, however, a decision maker may beinterested in optimizing a specific quantile of the cumulative reward instead of its expectation. In this paperwe consider the problem of optimizing the quantiles of the cumulative rewards of a Markov decision process(MDP), which we refer to as a quantile Markov decision process (QMDP). We provide analytical resultscharacterizing the optimal QMDP value function and present a dynamic programming-based algorithm tosolve for the optimal policy. The algorithm also extends to the MDP problem with a conditional value-at-risk(CVaR) objective. We illustrate the practical relevance of our model by evaluating it on an HIV treatmentinitiation problem, where patients aim to balance the potential benefits and risks of the treatment.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here