# Constrained Upper Confidence Reinforcement Learning

26 Jan 2020  ·  , ·

Constrained Markov Decision Processes are a class of stochastic decision problems in which the decision maker must select a policy that satisfies auxiliary cost constraints. This paper extends upper confidence reinforcement learning for settings in which the reward function and the constraints, described by cost functions, are unknown a priori but the transition kernel is known... Such a setting is well-motivated by a number of applications including exploration of unknown, potentially unsafe, environments. We present an algorithm C-UCRL and show that it achieves sub-linear regret ($O(T^{\frac{3}{4}}\sqrt{\log(T/\delta)})$) with respect to the reward while satisfying the constraints even while learning with probability $1-\delta$. Illustrative examples are provided. read more

PDF Abstract

## Code Add Remove Mark official

No code implementations yet. Submit your code now

## Datasets

Add Datasets introduced or used in this paper

## Results from the Paper Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

## Methods Add Remove

No methods listed for this paper. Add relevant methods here