Search Results for author: Peter Auer

Found 9 papers, 0 papers with code

Autonomous exploration for navigating in non-stationary CMPs

no code implementations18 Oct 2019 Pratik Gajane, Ronald Ortner, Peter Auer, Csaba Szepesvari

We consider a setting in which the objective is to learn to navigate in a controlled Markov process (CMP) where transition probabilities may abruptly change.

Variational Regret Bounds for Reinforcement Learning

no code implementations14 May 2019 Pratik Gajane, Ronald Ortner, Peter Auer

This is the first variational regret bound for the general reinforcement learning setting.

General Reinforcement Learning

A Sliding-Window Algorithm for Markov Decision Processes with Arbitrarily Changing Rewards and Transitions

no code implementations25 May 2018 Pratik Gajane, Ronald Ortner, Peter Auer

We consider reinforcement learning in changing Markov Decision Processes where both the state-transition probabilities and the reward functions may vary over time.

Online Learning with Randomized Feedback Graphs for Optimal PUE Attacks in Cognitive Radio Networks

no code implementations28 Sep 2017 Monireh Dabaghchian, Amir Alipour-Fanid, Kai Zeng, Qingsi Wang, Peter Auer

In this paper, for the first time, we study optimal PUE attack strategies by formulating an online learning problem where the attacker needs to dynamically decide the attacking channel in each time slot based on its attacking experience.

An algorithm with nearly optimal pseudo-regret for both stochastic and adversarial bandits

no code implementations27 May 2016 Peter Auer, Chao-Kai Chiang

We present an algorithm that achieves almost optimal pseudo-regret bounds against adversarial and stochastic bandits.

Upper-Confidence-Bound Algorithms for Active Learning in Multi-Armed Bandits

no code implementations16 Jul 2015 Alexandra Carpentier, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, Peter Auer, András Antos

If the variance of the distributions were known, one could design an optimal sampling strategy by collecting a number of independent samples per distribution that is proportional to their variance.

Active Learning Multi-Armed Bandits

PinView: Implicit Feedback in Content-Based Image Retrieval

no code implementations2 Oct 2014 Zakria Hussain, Arto Klami, Jussi Kujala, Alex P. Leung, Kitsuchart Pasupa, Peter Auer, Samuel Kaski, Jorma Laaksonen, John Shawe-Taylor

It then retrieves images with a specialized online learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user.

Content-Based Image Retrieval

PAC-Bayesian Analysis of Contextual Bandits

no code implementations NeurIPS 2011 Yevgeny Seldin, Peter Auer, John S. Shawe-Taylor, Ronald Ortner, François Laviolette

The scaling of our regret bound with the number of states (contexts) $N$ goes as $\sqrt{N I_{\rho_t}(S;A)}$, where $I_{\rho_t}(S;A)$ is the mutual information between states and actions (the side information) used by the algorithm at round $t$.

Multi-Armed Bandits

Near-optimal Regret Bounds for Reinforcement Learning

no code implementations NeurIPS 2008 Peter Auer, Thomas Jaksch, Ronald Ortner

For undiscounted reinforcement learning in Markov decision processes (MDPs) we consider the total regret of a learning algorithm with respect to an optimal policy.

Cannot find the paper you are looking for? You can Submit a new open access paper.