1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar
Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.
no code implementations • 8 Dec 2021 • Angelos Filos, Eszter Vértes, Zita Marinho, Gregory Farquhar, Diana Borsa, Abram Friesen, Feryal Behbahani, Tom Schaul, André Barreto, Simon Osindero
Unlike prior work which estimates uncertainty by training an ensemble of many models and/or value functions, this approach requires only the single model and value function which are already being learned in most model-based reinforcement learning algorithms.
Model-based Reinforcement Learning
Rolling Shutter Correction
no code implementations • NeurIPS 2021 • Gregory Farquhar, Kate Baumli, Zita Marinho, Angelos Filos, Matteo Hessel, Hado van Hasselt, David Silver
Learned models of the environment provide reinforcement learning (RL) agents with flexible ways of making predictions about the environment.
1 code implementation • NeurIPS 2021 • Christopher Grimm, André Barreto, Gregory Farquhar, David Silver, Satinder Singh
The value-equivalence (VE) principle proposes a simple answer to this question: a model should capture the aspects of the environment that are relevant for value-based planning.
Model-based Reinforcement Learning
Reinforcement Learning (RL)
1 code implementation • 24 Feb 2021 • Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
This allows us to disentangle shared features and dynamics of the environment from agent-specific rewards and policies.
4 code implementations • NeurIPS 2020 • Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson
We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action.
no code implementations • ICLR 2021 • Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
1 code implementation • 19 Mar 2020 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.
Ranked #6 on
SMAC
on SMAC 6h_vs_8z
1 code implementation • NeurIPS 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.
1 code implementation • 23 Sep 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.
1 code implementation • ICML 2020 • Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve
In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress.
no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.
20 code implementations • 11 Feb 2019 • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson
In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.
Ranked #6 on
SMAC
on SMAC 6h_vs_8z
1 code implementation • NeurIPS 2019 • Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson
In this paper, we show that common knowledge between agents allows for complex decentralised coordination.
Multi-agent Reinforcement Learning
reinforcement-learning
+3
no code implementations • 27 Sep 2018 • Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Gregory Farquhar, Maruan Al-Shedivat, Shimon Whiteson
To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.
1 code implementation • ICML 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson
Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.
14 code implementations • ICML 2018 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.
Ranked #1 on
SMAC+
on Off_Near_parallel
Multi-agent Reinforcement Learning
reinforcement-learning
+4
5 code implementations • 14 Feb 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson
Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.
1 code implementation • ICLR 2018 • Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson
To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions.
6 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Ranked #1 on
SMAC+
on Off_Superhard_parallel
5 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.