Search Results for author: Ronald Ortner

Found 11 papers, 1 papers with code

Transfer in Reinforcement Learning via Regret Bounds for Learning Agents

no code implementations2 Feb 2022 Adrienne Tuynman, Ronald Ortner

We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting.

reinforcement-learning Reinforcement Learning (RL) +1

Regret Bounds for Learning State Representations in Reinforcement Learning

no code implementations NeurIPS 2019 Ronald Ortner, Matteo Pirotta, Alessandro Lazaric, Ronan Fruit, Odalric-Ambrym Maillard

We consider the problem of online reinforcement learning when several state representations (mapping histories to a discrete state space) are available to the learning agent.

reinforcement-learning Reinforcement Learning (RL)

Autonomous exploration for navigating in non-stationary CMPs

no code implementations18 Oct 2019 Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.

Navigate

Regret Bounds for Reinforcement Learning via Markov Chain Concentration

no code implementations6 Aug 2018 Ronald Ortner

We give a simple optimistic algorithm for which it is easy to derive regret bounds of $\tilde{O}(\sqrt{t_{\rm mix} SAT})$ after $T$ steps in uniformly ergodic Markov decision processes with $S$ states, $A$ actions, and mixing time parameter $t_{\rm mix}$.

reinforcement-learning Reinforcement Learning (RL)

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

no code implementations25 May 2018 Pratik Gajane, Ronald Ortner, Peter Auer

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.

reinforcement-learning Reinforcement Learning (RL)

Efficient Bias-Span-Constrained Exploration-Exploitation in Reinforcement Learning

1 code implementation ICML 2018 Ronan Fruit, Matteo Pirotta, Alessandro Lazaric, Ronald Ortner

We introduce SCAL, an algorithm designed to perform efficient exploration-exploitation in any unknown weakly-communicating Markov decision process (MDP) for which an upper bound $c$ on the span of the optimal bias function is known.

Efficient Exploration reinforcement-learning +1

Selecting Near-Optimal Approximate State Representations in Reinforcement Learning

no code implementations12 May 2014 Ronald Ortner, Odalric-Ambrym Maillard, Daniil Ryabko

We consider a reinforcement learning setting introduced in (Maillard et al., NIPS 2011) where the learner does not have explicit access to the states of the underlying Markov decision process (MDP).

reinforcement-learning Reinforcement Learning (RL)

PAC-Bayesian Analysis of Contextual Bandits

no code implementations NeurIPS 2011 Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette

The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.

Multi-Armed Bandits

Near-optimal Regret Bounds for Reinforcement Learning

no code implementations NeurIPS 2008 Peter Auer, Thomas Jaksch, Ronald Ortner

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.