Search Results for author: Ian Osband

Found 34 papers, 16 papers with code

Approximate Thompson Sampling via Epistemic Neural Networks

1 code implementation18 Feb 2023 Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Further, we demonstrate that the \textit{epinet} -- a small additive network that estimates uncertainty -- matches the performance of large ensembles at orders of magnitude lower computational cost.

Thompson Sampling

Fine-Tuning Language Models via Epistemic Neural Networks

1 code implementation3 Nov 2022 Ian Osband, Seyed Mohammad Asghari, Benjamin Van Roy, Nat McAleese, John Aslanides, Geoffrey Irving

Language models often pre-train on large unsupervised text corpora, then fine-tune on additional task-specific data.

Active Learning Language Modelling

Robustness of Epinets against Distributional Shifts

no code implementations1 Jul 2022 Xiuyuan Lu, Ian Osband, Seyed Mohammad Asghari, Sven Gowal, Vikranth Dwaracherla, Zheng Wen, Benjamin Van Roy

However, these improvements are relatively small compared to the outstanding issues in distributionally-robust deep learning.

Ensembles for Uncertainty Estimation: Benefits of Prior Functions and Bootstrapping

no code implementations8 Jun 2022 Vikranth Dwaracherla, Zheng Wen, Ian Osband, Xiuyuan Lu, Seyed Mohammad Asghari, Benjamin Van Roy

In machine learning, an agent needs to estimate uncertainty to efficiently explore and adapt and to make effective decisions.

Evaluating Predictive Distributions: Does Bayesian Deep Learning Work?

no code implementations29 Sep 2021 Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Dieterich Lawson, Brendan O'Donoghue, Botao Hao, Benjamin Van Roy

This paper introduces \textit{The Neural Testbed}, which provides tools for the systematic evaluation of agents that generate such predictions.

Epistemic Neural Networks

4 code implementations19 Jul 2021 Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

We introduce the epinet: an architecture that can supplement any conventional neural network, including large pretrained models, and can be trained with modest incremental computation to estimate uncertainty.

Reinforcement Learning, Bit by Bit

no code implementations6 Mar 2021 Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

To illustrate concepts, we design simple agents that build on them and present computational results that highlight data efficiency.

reinforcement-learning Reinforcement Learning (RL)

Matrix games with bandit feedback

no code implementations9 Jun 2020 Brendan O'Donoghue, Tor Lattimore, Ian Osband

We study a version of the classical zero-sum matrix game with unknown payoff matrix and bandit feedback, where the players only observe each others actions and a noisy payoff.

Making Sense of Reinforcement Learning and Probabilistic Inference

no code implementations ICLR 2020 Brendan O'Donoghue, Ian Osband, Catalin Ionescu

Reinforcement learning (RL) combines a control problem with statistical estimation: The system dynamics are not known to the agent, but can be learned through experience.

reinforcement-learning Reinforcement Learning (RL) +1

Scalable Coordinated Exploration in Concurrent Reinforcement Learning

1 code implementation NeurIPS 2018 Maria Dimakopoulou, Ian Osband, Benjamin Van Roy

We consider a team of reinforcement learning agents that concurrently operate in a common environment, and we develop an approach to efficient coordinated exploration that is suitable for problems of practical scale.

reinforcement-learning Reinforcement Learning (RL)

The Uncertainty Bellman Equation and Exploration

1 code implementation ICML 2018 Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih

In this paper we consider a similar \textit{uncertainty} Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps.

A Tutorial on Thompson Sampling

2 code implementations7 Jul 2017 Daniel Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, Zheng Wen

Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.

Active Learning Product Recommendation +1

Noisy Networks for Exploration

15 code implementations ICLR 2018 Meire Fortunato, Mohammad Gheshlaghi Azar, Bilal Piot, Jacob Menick, Ian Osband, Alex Graves, Vlad Mnih, Remi Munos, Demis Hassabis, Olivier Pietquin, Charles Blundell, Shane Legg

We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent's policy can be used to aid efficient exploration.

Atari Games Efficient Exploration +2

On Optimistic versus Randomized Exploration in Reinforcement Learning

no code implementations13 Jun 2017 Ian Osband, Benjamin Van Roy

We discuss the relative merits of optimistic and randomized approaches to exploration in reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Deep Q-learning from Demonstrations

5 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

Deep Exploration via Randomized Value Functions

no code implementations22 Mar 2017 Ian Osband, Benjamin Van Roy, Daniel Russo, Zheng Wen

We study the use of randomized value functions to guide deep exploration in reinforcement learning.

Efficient Exploration reinforcement-learning +1

Gaussian-Dirichlet Posterior Dominance in Sequential Learning

no code implementations14 Feb 2017 Ian Osband, Benjamin Van Roy

We consider the problem of sequential learning from categorical observations bounded in [0, 1].

Posterior Sampling for Reinforcement Learning Without Episodes

1 code implementation9 Aug 2016 Ian Osband, Benjamin Van Roy

- Review similar results for optimistic algorithms in infinite horizon problems (Jaksch et al 2010, Bartlett and Tewari 2009, Abbasi-Yadkori and Szepesvari 2011), with particular attention to the dynamic episode growth.

reinforcement-learning Reinforcement Learning (RL)

On Lower Bounds for Regret in Reinforcement Learning

no code implementations9 Aug 2016 Ian Osband, Benjamin Van Roy

This is a brief technical note to clarify the state of lower bounds on regret for reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Why is Posterior Sampling Better than Optimism for Reinforcement Learning?

no code implementations ICML 2017 Ian Osband, Benjamin Van Roy

Computational results demonstrate that posterior sampling for reinforcement learning (PSRL) dramatically outperforms algorithms driven by optimism, such as UCRL2.

reinforcement-learning Reinforcement Learning (RL)

Bootstrapped Thompson Sampling and Deep Exploration

no code implementations1 Jul 2015 Ian Osband, Benjamin Van Roy

This technical note presents a new approach to carrying out the kind of exploration achieved by Thompson sampling, but without explicitly maintaining or sampling from posterior distributions.

reinforcement-learning Reinforcement Learning (RL) +1

Near-optimal Reinforcement Learning in Factored MDPs

no code implementations NeurIPS 2014 Ian Osband, Benjamin Van Roy

Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces.

reinforcement-learning Reinforcement Learning (RL)

Generalization and Exploration via Randomized Value Functions

1 code implementation4 Feb 2014 Ian Osband, Benjamin Van Roy, Zheng Wen

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions.

Efficient Exploration reinforcement-learning +1

(More) Efficient Reinforcement Learning via Posterior Sampling

no code implementations NeurIPS 2013 Ian Osband, Daniel Russo, Benjamin Van Roy

This bound is one of the first for an algorithm not based on optimism, and close to the state of the art for any reinforcement learning algorithm.

Efficient Exploration reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.