We consider qualitative strategy synthesis for the formalism called consumption Markov decision processes.
We show that, given a horizon $n$ in binary and an MDP, computing an optimal policy is EXP-complete, thus resolving an open problem that goes back to the seminal 1987 paper on the complexity of MDPs by Papadimitriou and Tsitsiklis.
We consider the expectation optimization with probabilistic guarantee (EOPG) problem, where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability.
In this work we go beyond both the "expectation" and "threshold" approaches and consider a "guaranteed payoff optimization (GPO)" problem for POMDPs, where we are given a threshold $t$ and the objective is to find a policy $\sigma$ such that a) each possible outcome of $\sigma$ yields a discounted-sum payoff of at least $t$, and b) the expected discounted-sum payoff of $\sigma$ is optimal (or near-optimal) among all policies satisfying a).
Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.