Search Results for author: Bobak Shahriari

Found 11 papers, 3 papers with code

Revisiting Gaussian mixture critics in off-policy reinforcement learning: a sample-based approach

no code implementations21 Apr 2022 Bobak Shahriari, Abbas Abdolmaleki, Arunkumar Byravan, Abe Friesen, SiQi Liu, Jost Tobias Springenberg, Nicolas Heess, Matt Hoffman, Martin Riedmiller

Actor-critic algorithms that make use of distributional policy evaluation have frequently been shown to outperform their non-distributional counterparts on many challenging control tasks.

Continuous Control DeepMind

On Multi-objective Policy Optimization as a Tool for Reinforcement Learning: Case Studies in Offline RL and Finetuning

no code implementations29 Sep 2021 Abbas Abdolmaleki, Sandy Huang, Giulia Vezzani, Bobak Shahriari, Jost Tobias Springenberg, Shruti Mishra, Dhruva Tirumala, Arunkumar Byravan, Konstantinos Bousmalis, András György, Csaba Szepesvari, Raia Hadsell, Nicolas Heess, Martin Riedmiller

Many advances that have improved the robustness and efficiency of deep reinforcement learning (RL) algorithms can, in one way or another, be understood as introducing additional objectives or constraints in the policy optimization step.

Offline RL

Critic Regularized Regression

4 code implementations NeurIPS 2020 Ziyu Wang, Alexander Novikov, Konrad Zolna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess, Nando de Freitas

Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction.

Offline RL reinforcement-learning

Acme: A Research Framework for Distributed Reinforcement Learning

3 code implementations1 Jun 2020 Matt Hoffman, Bobak Shahriari, John Aslanides, Gabriel Barth-Maron, Feryal Behbahani, Tamara Norman, Abbas Abdolmaleki, Albin Cassirer, Fan Yang, Kate Baumli, Sarah Henderson, Alex Novikov, Sergio Gómez Colmenarejo, Serkan Cabi, Caglar Gulcehre, Tom Le Paine, Andrew Cowie, Ziyu Wang, Bilal Piot, Nando de Freitas

Ultimately, we show that the design decisions behind Acme lead to agents that can be scaled both up and down and that, for the most part, greater levels of parallelization result in agents with equivalent performance, just faster.

DQN Replay Dataset reinforcement-learning

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

1 code implementation ICLR 2020 Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams, Gabriel Barth-Maron, Ziyu Wang, Nando de Freitas, Worlds Team

This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions.

Which Learning Algorithms Can Generalize Identity-Based Rules to Novel Inputs?

no code implementations12 May 2016 Paul Tupper, Bobak Shahriari

We propose a novel framework for the analysis of learning algorithms that allows us to say when such algorithms can and cannot generalize certain patterns from training data to test data.

Unbounded Bayesian Optimization via Regularization

no code implementations14 Aug 2015 Bobak Shahriari, Alexandre Bouchard-Côté, Nando de Freitas

Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning.

Heteroscedastic Treed Bayesian Optimisation

no code implementations27 Oct 2014 John-Alexander M. Assael, Ziyu Wang, Bobak Shahriari, Nando de Freitas

At the core of this approach is a Gaussian process prior that captures our belief about the distribution over functions.

Bayesian Optimisation BIG-bench Machine Learning

An Entropy Search Portfolio for Bayesian Optimization

no code implementations18 Jun 2014 Bobak Shahriari, Ziyu Wang, Matthew W. Hoffman, Alexandre Bouchard-Côté, Nando de Freitas

How- ever, the performance of a Bayesian optimization method very much depends on its exploration strategy, i. e. the choice of acquisition function, and it is not clear a priori which choice will result in superior performance.

Exploiting correlation and budget constraints in Bayesian multi-armed bandit optimization

no code implementations27 Mar 2013 Matthew W. Hoffman, Bobak Shahriari, Nando de Freitas

This problem is also known as fixed-budget best arm identification in the multi-armed bandit literature.

Cannot find the paper you are looking for? You can Submit a new open access paper.