No-regret Exploration in Contextual Reinforcement Learning

14 Mar 2019Aditya ModiAmbuj Tewari

We consider the recently proposed reinforcement learning (RL) framework of Contextual Markov Decision Processes (CMDP), where the agent interacts with an adversarial sequence of episodic tabular MDPs. In addition, a context vector determining the MDP parameters is available to the agent at the start of each episode, thereby allowing it to learn a context-dependent near-optimal policy... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.