Search Results for author: Ian Osband

Found 34 papers, 16 papers with code

Approximate Thompson Sampling via Epistemic Neural Networks

1 code implementation • 18 Feb 2023 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost.

Thompson Sampling

Paper
Code

Fine-Tuning Language Models via Epistemic Neural Networks

1 code implementation • 3 Nov 2022 • Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data.

Active Learning Language Modelling

188

Paper
Code

Robustness of Epinets against Distributional Shifts

no code implementations • 1 Jul 2022 • Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy

However, these improvements are relatively small compared to the outstanding issues in distributionally-robust deep learning.

Paper
Add Code

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

no code implementations • 8 Jun 2022 • Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions.

Paper
Add Code

Evaluating High-Order Predictive Distributions in Deep Learning

1 code implementation • 28 Feb 2022 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Xiuyuan Lu, Benjamin Van Roy

Previous work has developed methods for assessing low-order predictive distributions with inputs sampled i. i. d.

regression Vocal Bursts Intensity Prediction

188

Paper
Code

The Neural Testbed: Evaluating Joint Predictions

1 code implementation • 9 Oct 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Botao Hao, Morteza Ibrahimi, Dieterich Lawson, Xiuyuan Lu, Brendan O'Donoghue, Benjamin Van Roy

Predictive distributions quantify uncertainties ignored by point estimates.

188

Paper
Code

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

no code implementations • 29 Sep 2021 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Dieterich Lawson, Brendan O'Donoghue, Botao Hao, Benjamin Van Roy

This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions.

Uncertainty Quantification

Paper
Add Code

From Predictions to Decisions: The Importance of Joint Predictive Distributions

no code implementations • 20 Jul 2021 • Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

A fundamental challenge for any intelligent system is prediction: given some inputs, can you predict corresponding outcomes?

Multi-Armed Bandits Thompson Sampling

Paper
Add Code

Epistemic Neural Networks

1 code implementation • NeurIPS 2023 • Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

We introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty.

268

Paper
Code

Reinforcement Learning, Bit by Bit

no code implementations • 6 Mar 2021 • Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Hypermodels for Exploration

no code implementations • ICLR 2020 • Vikranth Dwaracherla, Xiuyuan Lu, Morteza Ibrahimi, Ian Osband, Zheng Wen, Benjamin Van Roy

This generalizes and extends the use of ensembles to approximate Thompson sampling.

Thompson Sampling

Paper
Add Code

Matrix games with bandit feedback

no code implementations • 9 Jun 2020 • Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Paper
Add Code

Making Sense of Reinforcement Learning and Probabilistic Inference

no code implementations • ICLR 2020 • Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Behaviour Suite for Reinforcement Learning

2 code implementations • ICLR 2020 • Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado van Hasselt

bsuite is a collection of carefully-designed experiments that investigate core capabilities of reinforcement learning (RL) agents with two objectives.

reinforcement-learning Reinforcement Learning (RL)

1,466

Paper
Code

Meta-learning of Sequential Strategies

no code implementations • 8 May 2019 • Pedro A. Ortega, Jane. X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann, Siddhant M. Jayakumar, Tom McGrath, Kevin Miller, Mohammad Azar, Ian Osband, Neil Rabinowitz, András György, Silvia Chiappa, Simon Osindero, Yee Whye Teh, Hado van Hasselt, Nando de Freitas, Matthew Botvinick, Shane Legg

In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class.

Meta-Learning

Paper
Add Code

Randomized Prior Functions for Deep Reinforcement Learning

1 code implementation • NeurIPS 2018 • Ian Osband, John Aslanides, Albin Cassirer

Dealing with uncertainty is essential for efficient reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

1 code implementation • NeurIPS 2018 • Maria Dimakopoulou, Ian Osband, Benjamin Van Roy

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

The Uncertainty Bellman Equation and Exploration

1 code implementation • ICML 2018 • Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

Paper
Code

A Tutorial on Thompson Sampling

2 code implementations • 7 Jul 2017 • Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

340

Paper
Code

Noisy Networks for Exploration

15 code implementations • ICLR 2018 • Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Ranked #1 on Atari Games on Atari 2600 Surround

Atari Games Efficient Exploration +2

2,505

Paper
Code

On Optimistic versus Randomized Exploration in Reinforcement Learning

no code implementations • 13 Jun 2017 • Ian Osband, Benjamin Van Roy

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning.

Computational Efficiency reinforcement-learning +1

Paper
Add Code

Deep Q-learning from Demonstrations

5 code implementations • 12 Apr 2017 • Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

2,505

Paper
Code

Deep Exploration via Randomized Value Functions

no code implementations • 22 Mar 2017 • Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Minimax Regret Bounds for Reinforcement Learning

1 code implementation • ICML 2017 • Mohammad Gheshlaghi Azar, Ian Osband, Rémi Munos

We consider the problem of provably optimal exploration in reinforcement learning for finite horizon MDPs.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Gaussian-Dirichlet Posterior Dominance in Sequential Learning

no code implementations • 14 Feb 2017 • Ian Osband, Benjamin Van Roy

We consider the problem of sequential learning from categorical observations bounded in [0, 1].

Paper
Add Code

On Lower Bounds for Regret in Reinforcement Learning

no code implementations • 9 Aug 2016 • Ian Osband, Benjamin Van Roy

This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Posterior Sampling for Reinforcement Learning Without Episodes

1 code implementation • 9 Aug 2016 • Ian Osband, Benjamin Van Roy

- Review similar results for optimistic algorithms in infinite horizon problems (Jaksch et al 2010, Bartlett and Tewari 2009, Abbasi-Yadkori and Szepesvari 2011), with particular attention to the dynamic episode growth.

reinforcement-learning Reinforcement Learning (RL)

Paper
Code

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

no code implementations • ICML 2017 • Ian Osband, Benjamin Van Roy

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Deep Exploration via Bootstrapped DQN

6 code implementations • NeurIPS 2016 • Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

Efficient exploration in complex environments remains a major challenge for reinforcement learning.

Ranked #6 on Atari Games on Atari 2600 Breakout

Atari Games Efficient Exploration +2

76,571

Paper
Code

Bootstrapped Thompson Sampling and Deep Exploration

no code implementations • 1 Jul 2015 • Ian Osband, Benjamin Van Roy

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Model-based Reinforcement Learning and the Eluder Dimension

no code implementations • NeurIPS 2014 • Ian Osband, Benjamin Van Roy

We consider the problem of learning to optimize an unknown Markov decision process (MDP).

Model-based Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Near-optimal Reinforcement Learning in Factored MDPs

no code implementations • NeurIPS 2014 • Ian Osband, Benjamin Van Roy

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Generalization and Exploration via Randomized Value Functions

1 code implementation • 4 Feb 2014 • Ian Osband, Benjamin Van Roy, Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.

Efficient Exploration reinforcement-learning +1

Paper
Code

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations • NeurIPS 2013 • Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration reinforcement-learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.