no code implementations • 2 Feb 2022 • Adrienne Tuynman, Ronald Ortner
We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting.
no code implementations • NeurIPS 2019 • Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard
We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.
no code implementations • 18 Oct 2019 • Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari
We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.
no code implementations • 14 May 2019 • Pratik Gajane, Ronald Ortner, Peter Auer
This is the first variational regret bound for the general reinforcement learning setting.
no code implementations • 6 Aug 2018 • Ronald Ortner
We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$.
no code implementations • 25 May 2018 • Pratik Gajane, Ronald Ortner, Peter Auer
We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.
1 code implementation • ICML 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner
We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.
no code implementations • 12 May 2014 • Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko
We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).
no code implementations • NeurIPS 2012 • Ronald Ortner, Daniil Ryabko
We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space.
no code implementations • NeurIPS 2011 • Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette
The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.
no code implementations • NeurIPS 2008 • Peter Auer, Thomas Jaksch, Ronald Ortner
For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.