Search Results for author: Ian Osband

Found 27 papers, 10 papers with code

Evaluating Probabilistic Inference in Deep Learning: Beyond Marginal Predictions

no code implementations20 Jul 2021 Xiuyuan Lu, Ian Osband, Benjamin Van Roy, Zheng Wen

A fundamental challenge for any intelligent system is prediction: given some inputs $X_1,.., X_\tau$ can you predict outcomes $Y_1,.., Y_\tau$.

Epistemic Neural Networks

1 code implementation19 Jul 2021 Ian Osband, Zheng Wen, Mohammad Asghari, Morteza Ibrahimi, Xiyuan Lu, Benjamin Van Roy

All existing approaches to uncertainty modeling can be expressed as ENNs, and any ENN can be identified with a Bayesian neural network.

Reinforcement Learning, Bit by Bit

no code implementations6 Mar 2021 Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

Reinforcement learning agents have demonstrated remarkable achievements in simulated environments.

Matrix games with bandit feedback

no code implementations9 Jun 2020 Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Making Sense of Reinforcement Learning and Probabilistic Inference

no code implementations ICLR 2020 Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience.

Behaviour Suite for Reinforcement Learning

2 code implementations ICLR 2020 Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

1 code implementation NeurIPS 2018 Maria Dimakopoulou, Ian Osband, Benjamin Van Roy

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale.

The Uncertainty Bellman Equation and Exploration

no code implementations ICML 2018 Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

A Tutorial on Thompson Sampling

3 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation

Noisy Networks for Exploration

11 code implementations ICLR 2018 Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Atari Games Efficient Exploration

On Optimistic versus Randomized Exploration in Reinforcement Learning

no code implementations13 Jun 2017 Ian Osband, Benjamin Van Roy

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning.

Deep Q-learning from Demonstrations

3 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Decision Making Imitation Learning +1

Deep Exploration via Randomized Value Functions

no code implementations22 Mar 2017 Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration

Minimax Regret Bounds for Reinforcement Learning

1 code implementation ICML 2017 Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.

Gaussian-Dirichlet Posterior Dominance in Sequential Learning

no code implementations14 Feb 2017 Ian Osband, Benjamin Van Roy

We consider the problem of sequential learning from categorical observations bounded in [0, 1].

On Lower Bounds for Regret in Reinforcement Learning

no code implementations9 Aug 2016 Ian Osband, Benjamin Van Roy

This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning.

Posterior Sampling for Reinforcement Learning Without Episodes

no code implementations9 Aug 2016 Ian Osband, Benjamin Van Roy

- Review similar results for optimistic algorithms in infinite horizon problems (Jaksch et al 2010, Bartlett and Tewari 2009, Abbasi-Yadkori and Szepesvari 2011), with particular attention to the dynamic episode growth.

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

no code implementations ICML 2017 Ian Osband, Benjamin Van Roy

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2.

Bootstrapped Thompson Sampling and Deep Exploration

no code implementations1 Jul 2015 Ian Osband, Benjamin Van Roy

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions.

Near-optimal Reinforcement Learning in Factored MDPs

no code implementations NeurIPS 2014 Ian Osband, Benjamin Van Roy

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces.

Generalization and Exploration via Randomized Value Functions

1 code implementation4 Feb 2014 Ian Osband, Benjamin Van Roy, Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.

Efficient Exploration

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations NeurIPS 2013 Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration

Cannot find the paper you are looking for? You can Submit a new open access paper.