Search Results for author: Arthur Guez

Found 25 papers, 12 papers with code

Acceleration in Policy Optimization

no code implementations18 Jun 2023 Veronica Chelu, Tom Zahavy, Arthur Guez, Doina Precup, Sebastian Flennerhag

We work towards a unifying paradigm for accelerating policy optimization methods in reinforcement learning (RL) by integrating foresight in the policy improvement step via optimistic and adaptive updates.

Meta-Learning Policy Gradient Methods +1

Large-Scale Retrieval for Reinforcement Learning

no code implementations10 Jun 2022 Peter C. Humphreys, Arthur Guez, Olivier Tieleman, Laurent SIfre, Théophane Weber, Timothy Lillicrap

Effective decision making involves flexibly relating past experiences and relevant contextual information to a novel situation.

Decision Making Offline RL +3

COptiDICE: Offline Constrained Reinforcement Learning via Stationary Distribution Correction Estimation

1 code implementation ICLR 2022 Jongmin Lee, Cosmin Paduraru, Daniel J. Mankowitz, Nicolas Heess, Doina Precup, Kee-Eung Kim, Arthur Guez

We consider the offline constrained reinforcement learning (RL) problem, in which the agent aims to compute a policy that maximizes expected return while satisfying given cost constraints, learning only from a pre-collected dataset.

Offline RL Off-policy evaluation +1

Beyond Tabula-Rasa: a Modular Reinforcement Learning Approach for Physically Embedded 3D Sokoban

no code implementations3 Oct 2020 Peter Karkus, Mehdi Mirza, Arthur Guez, Andrew Jaegle, Timothy Lillicrap, Lars Buesing, Nicolas Heess, Theophane Weber

We explore whether integrated tasks like Mujoban can be solved by composing RL modules together in a sense-plan-act hierarchy, where modules have well-defined roles similarly to classic robot architectures.

reinforcement-learning Reinforcement Learning (RL)

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

18 code implementations19 Nov 2019 Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent SIfre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver

When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

Atari Games Atari Games 100k +3

Augmenting learning using symmetry in a biologically-inspired domain

no code implementations1 Oct 2019 Shruti Mishra, Abbas Abdolmaleki, Arthur Guez, Piotr Trochim, Doina Precup

Invariances to translation, rotation and other spatial transformations are a hallmark of the laws of motion, and have widespread use in the natural sciences to reduce the dimensionality of systems of equations.

Data Augmentation Image Classification +1

Woulda, Coulda, Shoulda: Counterfactually-Guided Policy Search

no code implementations ICLR 2019 Lars Buesing, Theophane Weber, Yori Zwols, Sebastien Racaniere, Arthur Guez, Jean-Baptiste Lespiau, Nicolas Heess

In contrast to off-policy algorithms based on Importance Sampling which re-weight data, CF-GPS leverages a model to explicitly consider alternative outcomes, allowing the algorithm to make better use of experience data.

Learning to Search with MCTSnets

2 code implementations ICML 2018 Arthur Guez, Théophane Weber, Ioannis Antonoglou, Karen Simonyan, Oriol Vinyals, Daan Wierstra, Rémi Munos, David Silver

They are most typically solved by tree search algorithms that simulate ahead into the future, evaluate future states, and back-up those evaluations to the root of a search tree.

Increasing the Action Gap: New Operators for Reinforcement Learning

2 code implementations15 Dec 2015 Marc G. Bellemare, Georg Ostrovski, Arthur Guez, Philip S. Thomas, Rémi Munos

Extending the idea of a locally consistent operator, we then derive sufficient conditions for an operator to preserve optimality, leading to a family of operators which includes our consistent Bellman operator.

Atari Games Q-Learning +2

Deep Reinforcement Learning with Double Q-learning

97 code implementations22 Sep 2015 Hado van Hasselt, Arthur Guez, David Silver

The popular Q-learning algorithm is known to overestimate action values under certain conditions.

Atari Games Q-Learning +1

Bayes-Adaptive Simulation-based Search with Value Function Approximation

no code implementations NeurIPS 2014 Arthur Guez, Nicolas Heess, David Silver, Peter Dayan

Bayes-adaptive planning offers a principled solution to the exploration-exploitation trade-off under model uncertainty.

Better Optimism By Bayes: Adaptive Planning with Rich Models

no code implementations9 Feb 2014 Arthur Guez, David Silver, Peter Dayan

The computational costs of inference and planning have confined Bayesian model-based reinforcement learning to one of two dismal fates: powerful Bayes-adaptive planning but only for simplistic models, or powerful, Bayesian non-parametric models but using simple, myopic planning strategies such as Thompson sampling.

Model-based Reinforcement Learning Reinforcement Learning (RL) +1

Efficient Bayes-Adaptive Reinforcement Learning using Sample-Based Search

no code implementations NeurIPS 2012 Arthur Guez, David Silver, Peter Dayan

Bayesian model-based reinforcement learning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way.

Model-based Reinforcement Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.