Search Results for author: Ronald Ortner

Found 11 papers, 1 papers with code

Near-optimal Regret Bounds for Reinforcement Learning

no code implementations • NeurIPS 2008 • Peter Auer, Thomas Jaksch, Ronald Ortner

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

PAC-Bayesian Analysis of Contextual Bandits

no code implementations • NeurIPS 2011 • Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette

The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.

Multi-Armed Bandits

Paper
Add Code

Online Regret Bounds for Undiscounted Continuous Reinforcement Learning

no code implementations • NeurIPS 2012 • Ronald Ortner, Daniil Ryabko

We derive sublinear regret bounds for undiscounted reinforcement learning in continuous state space.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

no code implementations • 12 May 2014 • Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

1 code implementation • ICML 2018 • Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.

Efficient Exploration reinforcement-learning +1

Paper
Code

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

no code implementations • 25 May 2018 • Pratik Gajane, Ronald Ortner, Peter Auer

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

no code implementations • 6 Aug 2018 • Ronald Ortner

We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Variational Regret Bounds for Reinforcement Learning

no code implementations • 14 May 2019 • Pratik Gajane, Ronald Ortner, Peter Auer

This is the first variational regret bound for the general reinforcement learning setting.

General Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Autonomous exploration for navigating in non-stationary CMPs

no code implementations • 18 Oct 2019 • Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.

Navigate

Paper
Add Code

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations • NeurIPS 2019 • Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Transfer in Reinforcement Learning via Regret Bounds for Learning Agents

no code implementations • 2 Feb 2022 • Adrienne Tuynman, Ronald Ortner

We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.