Search Results for author: Ronald Ortner

Found 10 papers, 1 papers with code

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations NeurIPS 2019 Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

Autonomous exploration for navigating in non-stationary CMPs

no code implementations18 Oct 2019 Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.

Variational Regret Bounds for Reinforcement Learning

no code implementations14 May 2019 Pratik Gajane, Ronald Ortner, Peter Auer

This is the first variational regret bound for the general reinforcement learning setting.

General Reinforcement Learning

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

no code implementations6 Aug 2018 Ronald Ortner

We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$.

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

no code implementations25 May 2018 Pratik Gajane, Ronald Ortner, Peter Auer

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

1 code implementation ICML 2018 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.

Efficient Exploration

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

no code implementations12 May 2014 Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

no code implementations NeurIPS 2012 Ronald Ortner, Daniil Ryabko

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space.

PAC-Bayesian Analysis of Contextual Bandits

no code implementations NeurIPS 2011 Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette

The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.

Multi-Armed Bandits

Near-optimal Regret Bounds for Reinforcement Learning

no code implementations NeurIPS 2008 Peter Auer, Thomas Jaksch, Ronald Ortner

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.

Cannot find the paper you are looking for? You can Submit a new open access paper.