Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

21 Apr 2022 Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, SiQi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks.

Continuous Control

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

29 Sep 2021 Abbas Abdolmaleki, Sandy Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva Tirumala, Arunkumar Byravan, Konstantinos Bousmalis, András György, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.

Offline RL

Critic Regularized Regression

NeurIPS 2020 Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL

Acme: A Research Framework for Distributed Reinforcement Learning

1 Jun 2020 Matt Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

DQN Replay Dataset

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

ICLR 2020 Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

Which Learning Algorithms Can Generalize Identity-Based Rules to Novel Inputs?

12 May 2016 Paul Tupper, Bobak Shahriari

We propose a novel framework for the analysis of learning algorithms that allows us to say when such algorithms can and cannot generalize certain patterns from training data to test data.

Unbounded Bayesian Optimization via Regularization

14 Aug 2015 Bobak Shahriari, Alexandre Bouchard-Côté, Nando de Freitas

Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning.

Heteroscedastic Treed Bayesian Optimisation

27 Oct 2014 John-Alexander M. Assael, Ziyu Wang, Bobak Shahriari, Nando de Freitas

At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions.

Bayesian Optimisation

An Entropy Search Portfolio for Bayesian Optimization

18 Jun 2014 Bobak Shahriari, Ziyu Wang, Matthew W. Hoffman, Alexandre Bouchard-Côté, Nando de Freitas

How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i. e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance.

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

27 Mar 2013 Matthew W. Hoffman, Bobak Shahriari, Nando de Freitas

This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature.

