no code implementations • 26 Nov 2024 • Aman Sinha, Payam Nikdel, Supratik Paul, Shimon Whiteson
Ensuring the safety of autonomous vehicles (AVs) requires both accurate estimation of their performance and efficient discovery of potential failure cases.
1 code implementation • 7 Nov 2024 • Clémence Grislain, Risto Vuorio, Cong Lu, Shimon Whiteson
In this work, we introduce \textbf{IGDrivSim}, a benchmark built on top of the Waymax simulator, designed to investigate the effects of the imitation gap in learning autonomous driving policy from human expert demonstrations.
1 code implementation • 9 Jul 2024 • Alexander David Goldie, Chris Lu, Matthew Thomas Jackson, Shimon Whiteson, Jakob Nicolaus Foerster
While reinforcement learning (RL) holds great potential for decision making in the real world, it suffers from a number of unique difficulties which often need specific consideration.
no code implementations • 6 May 2024 • Reza Mahjourian, Rongbing Mu, Valerii Likhosherstov, Paul Mougin, Xiukun Huang, Joao Messias, Shimon Whiteson
This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation.
1 code implementation • 9 Apr 2024 • Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster
Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.
1 code implementation • 5 Mar 2024 • Jacob Beck, Matthew Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson
However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not.
1 code implementation • 9 Feb 2024 • Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies.
1 code implementation • 8 Feb 2024 • Matthew Thomas Jackson, Chris Lu, Louis Kirsch, Robert Tjarko Lange, Shimon Whiteson, Jakob Nicolaus Foerster
We propose a simple augmentation to two existing objective discovery approaches that allows the discovered algorithm to dynamically update its objective function throughout the agent's training procedure, resulting in expressive schedules and increased generalization across different training horizons.
2 code implementations • 16 Nov 2023 • Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Ravi Hammond, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster
Benchmarks are crucial in the development of machine learning algorithms, with available environments significantly influencing reinforcement learning (RL) research.
1 code implementation • NeurIPS 2023 • Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster
Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.
Deep Reinforcement Learning General Reinforcement Learning +2
1 code implementation • NeurIPS 2023 • Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson
While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline.
no code implementations • 25 Sep 2023 • Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson
However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i. e., those under the agent's control, are encoded in the type.
no code implementations • 24 Aug 2023 • Mattie Fellows, Brandon Kaplowitz, Christian Schroeder de Witt, Shimon Whiteson
In this paper, we introduce a novel Bayesian model-free formulation and the first analysis showing that model-free approaches can yield Bayes-optimal policies.
1 code implementation • NeurIPS 2023 • Nico Montali, John Lambert, Paul Mougin, Alex Kuefler, Nick Rhinehart, Michelle Li, Cole Gulino, Tristan Emrich, Zoey Yang, Shimon Whiteson, Brandyn White, Dragomir Anguelov
Simulation with realistic, interactive agents represents a key task for autonomous vehicle software development.
no code implementations • 19 Mar 2023 • Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson
By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 24 Feb 2023 • Mattie Fellows, Matthew J. A. Smith, Shimon Whiteson
Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process.
1 code implementation • 22 Feb 2023 • Zheng Xiong, Jacob Beck, Shimon Whiteson
Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control.
no code implementations • 15 Feb 2023 • Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson
In this paper, we show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.
no code implementations • 19 Jan 2023 • Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson
Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible.
no code implementations • 21 Dec 2022 • Yiren Lu, Justin Fu, George Tucker, Xinlei Pan, Eli Bronstein, Rebecca Roelofs, Benjamin Sapp, Brandyn White, Aleksandra Faust, Shimon Whiteson, Dragomir Anguelov, Sergey Levine
To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.
1 code implementation • NeurIPS 2023 • Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson
In this work, we conduct new analysis demonstrating that SMAC lacks the stochasticity and partial observability to require complex *closed-loop* policies.
no code implementations • 14 Dec 2022 • Angad Singh, Omar Makhlouf, Maximilian Igl, Joao Messias, Arnaud Doucet, Shimon Whiteson
Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates.
no code implementations • 2 Dec 2022 • Eli Bronstein, Sirish Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson
However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset.
1 code implementation • 21 Oct 2022 • Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner.
1 code implementation • 20 Oct 2022 • Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Shimon Whiteson
In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.
no code implementations • 18 Oct 2022 • Eli Bronstein, Mark Palatucci, Dominik Notz, Brandyn White, Alex Kuefler, Yiren Lu, Supratik Paul, Payam Nikdel, Paul Mougin, Hongge Chen, Justin Fu, Austin Abrams, Punit Shah, Evan Racah, Benjamin Frenkel, Shimon Whiteson, Dragomir Anguelov
We demonstrate the first large-scale application of model-based generative adversarial imitation learning (MGAIL) to the task of dense urban self-driving.
1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar
Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.
no code implementations • 26 Jun 2022 • Darius Muglich, Luisa Zintgraf, Christian Schroeder de Witt, Shimon Whiteson, Jakob Foerster
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings.
no code implementations • 6 May 2022 • Maximilian Igl, Daewoo Kim, Alex Kuefler, Paul Mougin, Punit Shah, Kyriacos Shiarlis, Dragomir Anguelov, Mark Palatucci, Brandyn White, Shimon Whiteson
The beam search refines these policies on the fly by pruning branches that are unfavourably evaluated by a discriminator.
no code implementations • 31 Jan 2022 • Mingfei Sun, Vitaly Kurin, Guoqing Liu, Sam Devlin, Tao Qin, Katja Hofmann, Shimon Whiteson
Furthermore, we show that ESPO can be easily scaled up to distributed training with many workers, delivering strong performance as well.
no code implementations • 31 Jan 2022 • Anuj Mahajan, Mikayel Samvelyan, Tarun Gupta, Benjamin Ellis, Mingfei Sun, Tim Rocktäschel, Shimon Whiteson
Specifically, we study generalization bounds under a linear dependence of the underlying dynamics on the agent capabilities, which can be seen as a generalization of Successor Features to MAS.
no code implementations • 31 Jan 2022 • Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson
We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary.
1 code implementation • 11 Jan 2022 • Vitaly Kurin, Alessandro De Palma, Ilya Kostrikov, Shimon Whiteson, M. Pawan Kumar
We show that unitary scalarization, coupled with standard regularization and stabilization techniques from single-task learning, matches or improves upon the performance of complex multi-task optimizers in popular supervised and reinforcement learning settings.
1 code implementation • 11 Dec 2021 • Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson
Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications.
no code implementations • 1 Dec 2021 • Zheng Xiong, Luisa Zintgraf, Jacob Beck, Risto Vuorio, Shimon Whiteson
We further find that theoretically inconsistent algorithms can be made consistent by continuing to update all agent components on the OOD tasks, and adapt as well or better than originally consistent ones.
1 code implementation • NeurIPS 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.
no code implementations • 27 Oct 2021 • Pascal Van Der Vaart, Anuj Mahajan, Shimon Whiteson
A challenge in multi-agent reinforcement learning is to be able to generalize over intractable state-action spaces.
Model-based Reinforcement Learning Multi-agent Reinforcement Learning +4
no code implementations • 27 Oct 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 29 Sep 2021 • Matthew J. A. Smith, Shimon Whiteson
Overfitting has been recently acknowledged as a key limiting factor in the capabilities of reinforcement learning algorithms, despite little theoretical characterisation.
1 code implementation • 11 Aug 2021 • Shangtong Zhang, Shimon Whiteson
Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems.
1 code implementation • 17 Jul 2021 • Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster
We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.
no code implementations • NeurIPS 2021 • Matthew Fellows, Kristian Hartikainen, Shimon Whiteson
We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator.
no code implementations • 6 Jun 2021 • Mingfei Sun, Anuj Mahajan, Katja Hofmann, Shimon Whiteson
We present SoftDICE, which achieves state-of-the-art performance for imitation learning.
no code implementations • 31 May 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar
Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.
no code implementations • 27 Apr 2021 • Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson
Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios.
no code implementations • 22 Mar 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.
1 code implementation • NeurIPS 2021 • Charlie Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson
Recent research has shown that graph neural networks (GNNs) can learn policies for locomotion control that are as effective as a typical multi-layer perceptron (MLP), with superior transfer and multi-task performance (Wang et al., 2018; Huang et al., 2020).
1 code implementation • 21 Jan 2021 • Shangtong Zhang, Hengshuai Yao, Shimon Whiteson
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.
no code implementations • 11 Jan 2021 • Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann
The optimal adaptive behaviour under uncertainty over the other agents' strategies w. r. t.
1 code implementation • 8 Jan 2021 • Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson
We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.
1 code implementation • NeurIPS 2020 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.
7 code implementations • 18 Nov 2020 • Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson
Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function.
no code implementations • 6 Oct 2020 • Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson
VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities.
Multi-agent Reinforcement Learning reinforcement-learning +4
1 code implementation • ICLR 2021 • Vitaly Kurin, Maximilian Igl, Tim Rocktäschel, Wendelin Boehmer, Shimon Whiteson
They also allow practitioners to inject biases encoded in the structure of the input graph.
2 code implementations • ICLR 2021 • Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang
Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.
1 code implementation • 2 Oct 2020 • Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes
In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.
1 code implementation • 2 Oct 2020 • Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson
To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep.
no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Henri Bouma
Automated tracking is key to many computer vision applications.
no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Matthijs T. J. Spaan
Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function.
1 code implementation • ICML Workshop LaReL 2020 • Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel
This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment.
1 code implementation • NeurIPS 2020 • Shangtong Zhang, Vivek Veeriah, Shimon Whiteson
We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge.
4 code implementations • NeurIPS 2020 • Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson
We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action.
no code implementations • ICLR 2021 • Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson
Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.
2 code implementations • 7 Jun 2020 • Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities.
no code implementations • 19 May 2020 • Pierre-Alexandre Kamienny, Kai Arulkumaran, Feryal Behbahani, Wendelin Boehmer, Shimon Whiteson
Using privileged information during training can improve the sample efficiency and performance of machine learning systems.
no code implementations • 11 May 2020 • Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White
Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.
1 code implementation • 22 Apr 2020 • Shangtong Zhang, Bo Liu, Shimon Whiteson
We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable.
1 code implementation • 19 Mar 2020 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.
Ranked #6 on SMAC on SMAC 6h_vs_8z
3 code implementations • NeurIPS 2021 • Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
1 code implementation • ICLR 2020 • Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson
We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting.
1 code implementation • 11 Feb 2020 • Dmitrii Beloborodov, A. E. Ulanov, Jakob N. Foerster, Shimon Whiteson, A. I. Lvovsky
Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization.
1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Shimon Whiteson
Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution.
no code implementations • 23 Jan 2020 • Guangliang Li, Hamdi Dibeklioğlu, Shimon Whiteson, Hayley Hung
Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user.
1 code implementation • NeurIPS 2019 • Supratik Paul, Vitaly Kurin, Shimon Whiteson
The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.
1 code implementation • NeurIPS 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.
no code implementations • 29 Nov 2019 • Leo Feng, Luisa Zintgraf, Bei Peng, Shimon Whiteson
In few-shot learning, typically, the loss function which is applied at test time is the one we are ultimately interested in minimising, such as the mean-squared-error loss for a regression problem.
1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson
With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.
3 code implementations • ICLR 2020 • Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning.
4 code implementations • NeurIPS 2019 • Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson
We specifically focus on QMIX [40], the current state-of-the-art in this domain.
2 code implementations • ICML 2020 • Wendelin Böhmer, Vitaly Kurin, Shimon Whiteson
This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning.
2 code implementations • 26 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.
no code implementations • 25 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro
We present GQSAT, a branching heuristic in a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation.
1 code implementation • 23 Sep 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.
1 code implementation • ICML 2020 • Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve
In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress.
no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.
no code implementations • 5 Jun 2019 • Wendelin Böhmer, Tabish Rashid, Shimon Whiteson
This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning.
1 code implementation • 3 May 2019 • Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson
We revisit residual algorithms in both model-free and model-based reinforcement learning settings.
Model-based Reinforcement Learning reinforcement-learning +2
2 code implementations • NeurIPS 2019 • Shangtong Zhang, Shimon Whiteson
We reformulate the option framework as two parallel augmented MDPs.
1 code implementation • 1 Apr 2019 • Maximilian Igl, Andrew Gambardella, Jinke He, Nantas Nardelli, N. Siddharth, Wendelin Böhmer, Shimon Whiteson
We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference.
1 code implementation • NeurIPS 2019 • Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson
We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.
1 code implementation • 18 Feb 2019 • Supratik Paul, Vitaly Kurin, Shimon Whiteson
The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.
21 code implementations • 11 Feb 2019 • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson
In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.
Ranked #6 on SMAC on SMAC 6h_vs_8z
no code implementations • ICLR 2019 • Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson
A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.
no code implementations • 8 Nov 2018 • Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson
Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical.
1 code implementation • 4 Nov 2018 • Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling
We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.
1 code implementation • NeurIPS 2019 • Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson
This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps.
1 code implementation • NeurIPS 2019 • Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson
In this paper, we show that common knowledge between agents allows for complex decentralised coordination.
Multi-agent Reinforcement Learning reinforcement-learning +4
1 code implementation • 8 Oct 2018 • Luisa M. Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson
We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable.
no code implementations • 27 Sep 2018 • Luisa M Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson
We propose CAML, a meta-learning method for fast adaptation that partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks.
no code implementations • 27 Sep 2018 • Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Gregory Farquhar, Maruan Al-Shedivat, Shimon Whiteson
To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.
1 code implementation • ICML 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson
Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.
1 code implementation • ICML 2018 • Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson
Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown.
no code implementations • 27 May 2018 • Supratik Paul, Michael A. Osborne, Shimon Whiteson
Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator.
16 code implementations • ICML 2018 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.
Ranked #1 on SMAC+ on Off_Near_parallel
Multi-agent Reinforcement Learning reinforcement-learning +5
1 code implementation • ICML 2018 • Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner
Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks.
no code implementations • ICML 2018 • Matthew Fellows, Kamil Ciosek, Shimon Whiteson
We propose a new way of deriving policy gradient updates for reinforcement learning.
5 code implementations • 14 Feb 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson
Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.
no code implementations • 10 Jan 2018 • Kamil Ciosek, Shimon Whiteson
For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.
no code implementations • NeurIPS 2017 • Joao V. Messias, Shimon Whiteson
Reinforcement learning (RL) in partially observable settings is challenging because the agent’s observations are not Markov.
1 code implementation • ICLR 2018 • Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson
To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions.
6 code implementations • 13 Sep 2017 • Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch
We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL.
no code implementations • 15 Jun 2017 • Kamil Ciosek, Shimon Whiteson
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.
6 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Ranked #1 on SMAC+ on Off_Superhard_parallel
5 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.
12 code implementations • 5 Nov 2016 • Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas
Lipreading is the task of decoding text from the movement of a speaker's mouth.
Ranked #5 on Lipreading on GRID corpus (mixed-speech)
2 code implementations • 9 Oct 2016 • Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson
We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori.
Deep Reinforcement Learning Multi-Objective Reinforcement Learning +1
no code implementations • 24 May 2016 • Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson
ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.
3 code implementations • NeurIPS 2016 • Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson
We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility.
no code implementations • 25 Feb 2016 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek
Submodular function maximization finds application in a variety of real-world decision-making problems.
no code implementations • 8 Feb 2016 • Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson
We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks.
no code implementations • NeurIPS 2015 • Masrour Zoghi, Zohar Karnin, Shimon Whiteson, Maarten de Rijke
A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist.
no code implementations • 4 Feb 2014 • Diederik Marijn Roijers, Peter Vamplew, Shimon Whiteson, Richard Dazeley
Using this taxonomy, we survey the literature on multi-objective methods for planning and learning.
no code implementations • 4 Feb 2014 • Frans Adriaan Oliehoek, Matthijs T. J. Spaan, Christopher Amato, Shimon Whiteson
We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent.
no code implementations • 12 Dec 2013 • Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten de Rijke
This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms.
no code implementations • 1 Aug 2011 • Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan
Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type.