no code implementations • NeurIPS 2008 • Peter Auer, Thomas Jaksch, Ronald Ortner
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.
no code implementations • NeurIPS 2011 • Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette
The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.
no code implementations • NeurIPS 2012 • Ronald Ortner, Daniil Ryabko
We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space.
no code implementations • 12 May 2014 • Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko
We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).
1 code implementation • ICML 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner
We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.
no code implementations • 25 May 2018 • Pratik Gajane, Ronald Ortner, Peter Auer
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.
no code implementations • 6 Aug 2018 • Ronald Ortner
We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$.
no code implementations • 14 May 2019 • Pratik Gajane, Ronald Ortner, Peter Auer
This is the first variational regret bound for the general reinforcement learning setting.
no code implementations • 18 Oct 2019 • Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.
no code implementations • NeurIPS 2019 • Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard
We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.
no code implementations • 2 Feb 2022 • Adrienne Tuynman, Ronald Ortner
We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting.