You need to log in to edit.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

You can create a new account if you don't have one.

Or, discuss a change on Slack.

no code implementations • 11 Aug 2021 • Shangtong Zhang, Shimon Whiteson

Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad (Sutton and Barto, 2018) of off-policy RL, there are still three open problems.

no code implementations • 17 Jul 2021 • Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Shimon Whiteson, Jakob Foerster

In many common-payoff games, achieving good performance requires players to develop protocols for communicating their private information implicitly -- i. e., using actions that have non-communicative effects on the environment.

no code implementations • 9 Jun 2021 • Matthew Fellows, Kristian Hartikainen, Shimon Whiteson

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator.

no code implementations • 6 Jun 2021 • Mingfei Sun, Anuj Mahajan, Katja Hofmann, Shimon Whiteson

We present SoftDICE, which achieves state-of-the-art performance for imitation learning.

no code implementations • 31 May 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.

no code implementations • 27 Apr 2021 • Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios.

no code implementations • 22 Mar 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

no code implementations • 1 Mar 2021 • Charlie Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson

Recent research has shown that Graph Neural Networks (GNNs) can learn policies for locomotion control that are as effective as a typical multi-layer perceptron (MLP), with superior transfer and multi-task performance (Wang et al., 2018; Huang et al., 2020).

1 code implementation • 21 Jan 2021 • Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

no code implementations • 11 Jan 2021 • Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

The optimal adaptive behaviour under uncertainty over the other agents' strategies w. r. t.

1 code implementation • 8 Jan 2021 • Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.

1 code implementation • NeurIPS 2020 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

2 code implementations • 18 Nov 2020 • Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function.

no code implementations • 6 Oct 2020 • Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities.

1 code implementation • ICLR 2021 • Vitaly Kurin, Maximilian Igl, Tim Rocktäschel, Wendelin Boehmer, Shimon Whiteson

They also allow practitioners to inject biases encoded in the structure of the input graph.

2 code implementations • ICLR 2021 • Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang

Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.

no code implementations • 2 Oct 2020 • Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

no code implementations • 2 Oct 2020 • Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson

To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep.

no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Matthijs T. J. Spaan

Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function.

no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Henri Bouma

Automated tracking is key to many computer vision applications.

1 code implementation • 17 Jul 2020 • Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment.

1 code implementation • NeurIPS 2020 • Shangtong Zhang, Vivek Veeriah, Shimon Whiteson

We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge.

3 code implementations • NeurIPS 2020 • Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson

We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action.

no code implementations • ICLR 2021 • Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.

2 code implementations • 7 Jun 2020 • Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities.

no code implementations • 19 May 2020 • Pierre-Alexandre Kamienny, Kai Arulkumaran, Feryal Behbahani, Wendelin Boehmer, Shimon Whiteson

Using privileged information during training can improve the sample efficiency and performance of machine learning systems.

no code implementations • 11 May 2020 • Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.

no code implementations • 22 Apr 2020 • Shangtong Zhang, Bo Liu, Shimon Whiteson

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable.

1 code implementation • 19 Mar 2020 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.

Ranked #4 on SMAC on SMAC 6h_vs_8z

2 code implementations • 14 Mar 2020 • Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.

1 code implementation • ICLR 2020 • Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson

We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting.

1 code implementation • 11 Feb 2020 • Dmitrii Beloborodov, A. E. Ulanov, Jakob N. Foerster, Shimon Whiteson, A. I. Lvovsky

Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization.

1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Shimon Whiteson

Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution.

no code implementations • 23 Jan 2020 • Guangliang Li, Hamdi Dibeklioğlu, Shimon Whiteson, Hayley Hung

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user.

1 code implementation • NeurIPS 2019 • Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

1 code implementation • NeurIPS 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

no code implementations • 29 Nov 2019 • Leo Feng, Luisa Zintgraf, Bei Peng, Shimon Whiteson

In few-shot learning, typically, the loss function which is applied at test time is the one we are ultimately interested in minimising, such as the mean-squared-error loss for a regression problem.

1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

1 code implementation • ICLR 2020 • Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning.

2 code implementations • NeurIPS 2019 • Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson

We specifically focus on QMIX [40], the current state-of-the-art in this domain.

2 code implementations • ICML 2020 • Wendelin Böhmer, Vitaly Kurin, Shimon Whiteson

This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning.

1 code implementation • 26 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

1 code implementation • 23 Sep 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

1 code implementation • ICML 2020 • Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve

In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress.

no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.

no code implementations • 5 Jun 2019 • Wendelin Böhmer, Tabish Rashid, Shimon Whiteson

This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning.

1 code implementation • 3 May 2019 • Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We revisit residual algorithms in both model-free and model-based reinforcement learning settings.

1 code implementation • NeurIPS 2019 • Shangtong Zhang, Shimon Whiteson

We reformulate the option framework as two parallel augmented MDPs.

1 code implementation • 1 Apr 2019 • Maximilian Igl, Andrew Gambardella, Jinke He, Nantas Nardelli, N. Siddharth, Wendelin Böhmer, Shimon Whiteson

We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference.

1 code implementation • NeurIPS 2019 • Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.

1 code implementation • 18 Feb 2019 • Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

11 code implementations • 11 Feb 2019 • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson

In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.

Ranked #4 on SMAC on SMAC 6h_vs_8z

no code implementations • ICLR 2019 • Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.

no code implementations • 8 Nov 2018 • Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical.

1 code implementation • 4 Nov 2018 • Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.

1 code implementation • NeurIPS 2019 • Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson

This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps.

1 code implementation • NeurIPS 2019 • Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

In this paper, we show that common knowledge between agents allows for complex decentralised coordination.

1 code implementation • 8 Oct 2018 • Luisa M. Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson

We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable.

1 code implementation • ICML 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

1 code implementation • ICML 2018 • Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown.

no code implementations • 27 May 2018 • Supratik Paul, Michael A. Osborne, Shimon Whiteson

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator.

9 code implementations • ICML 2018 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.

1 code implementation • ICML 2018 • Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks.

no code implementations • ICML 2018 • Matthew Fellows, Kamil Ciosek, Shimon Whiteson

We propose a new way of deriving policy gradient updates for reinforcement learning.

5 code implementations • 14 Feb 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

no code implementations • 10 Jan 2018 • Kamil Ciosek, Shimon Whiteson

For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.

no code implementations • NeurIPS 2017 • Joao V. Messias, Shimon Whiteson

Reinforcement learning (RL) in partially observable settings is challenging because the agent’s observations are not Markov.

1 code implementation • ICLR 2018 • Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson

To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions.

6 code implementations • 13 Sep 2017 • Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL.

no code implementations • 15 Jun 2017 • Kamil Ciosek, Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.

5 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

4 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.

11 code implementations • 5 Nov 2016 • Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas

Lipreading is the task of decoding text from the movement of a speaker's mouth.

Ranked #4 on Lipreading on GRID corpus (mixed-speech)

1 code implementation • 9 Oct 2016 • Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson

We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori.

no code implementations • 24 May 2016 • Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

3 code implementations • NeurIPS 2016 • Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility.

no code implementations • 25 Feb 2016 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek

Submodular function maximization finds application in a variety of real-world decision-making problems.

no code implementations • 8 Feb 2016 • Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks.

no code implementations • NeurIPS 2015 • Masrour Zoghi, Zohar Karnin, Shimon Whiteson, Maarten de Rijke

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist.

no code implementations • 4 Feb 2014 • Diederik Marijn Roijers, Peter Vamplew, Shimon Whiteson, Richard Dazeley

Using this taxonomy, we survey the literature on multi-objective methods for planning and learning.

no code implementations • 4 Feb 2014 • Frans Adriaan Oliehoek, Matthijs T. J. Spaan, Christopher Amato, Shimon Whiteson

We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent.

no code implementations • 12 Dec 2013 • Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten de Rijke

This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms.

no code implementations • 1 Aug 2011 • Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan

Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type.

Cannot find the paper you are looking for? You can
Submit a new open access paper.

Contact us on:
hello@paperswithcode.com
.
Papers With Code is a free resource with all data licensed under CC-BY-SA.