Search Results for author: Shimon Whiteson

Found 83 papers, 46 papers with code

Truncated Emphatic Temporal Difference Methods for Prediction and Control

no code implementations11 Aug 2021 Shangtong Zhang, Shimon Whiteson

Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad (Sutton and Barto, 2018) of off-policy RL, there are still three open problems.

Implicit Communication as Minimum Entropy Coupling

no code implementations17 Jul 2021 Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Shimon Whiteson, Jakob Foerster

In many common-payoff games, achieving good performance requires players to develop protocols for communicating their private information implicitly -- i. e., using actions that have non-communicative effects on the environment.

Multi-agent Reinforcement Learning

Bayesian Bellman Operators

no code implementations9 Jun 2021 Matthew Fellows, Kristian Hartikainen, Shimon Whiteson

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator.

Continuous Control

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

no code implementations31 May 2021 Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.

Learning Theory Multi-agent Reinforcement Learning +1

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

no code implementations27 Apr 2021 Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios.

Policy Gradient Methods SMAC +1

Regularized Softmax Deep Multi-Agent $Q$-Learning

no code implementations22 Mar 2021 Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +2

Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing

no code implementations1 Mar 2021 Charlie Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson

Recent research has shown that Graph Neural Networks (GNNs) can learn policies for locomotion control that are as effective as a typical multi-layer perceptron (MLP), with superior transfer and multi-task performance (Wang et al., 2018; Huang et al., 2020).

Continuous Control

Breaking the Deadly Triad with a Target Network

1 code implementation21 Jan 2021 Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

Q-Learning

Average-Reward Off-Policy Policy Evaluation with Function Approximation

1 code implementation8 Jan 2021 Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.

Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

1 code implementation NeurIPS 2020 Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

Feature Engineering Q-Learning

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

2 code implementations18 Nov 2020 Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function.

SMAC Starcraft

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

no code implementations6 Oct 2020 Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities.

Multi-agent Reinforcement Learning Starcraft +1

RODE: Learning Roles to Decompose Multi-Agent Tasks

2 code implementations ICLR 2021 Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang

Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.

Starcraft Starcraft II

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

no code implementations2 Oct 2020 Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

no code implementations2 Oct 2020 Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson

To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep.

Meta-Learning Meta Reinforcement Learning

Exploiting Submodular Value Functions For Scaling Up Active Perception

no code implementations21 Sep 2020 Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Matthijs T. J. Spaan

Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function.

WordCraft: An Environment for Benchmarking Commonsense Agents

1 code implementation17 Jul 2020 Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment.

Knowledge Graphs Representation Learning

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

3 code implementations NeurIPS 2020 Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson

We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action.

Multi-agent Reinforcement Learning Q-Learning +1

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

2 code implementations7 Jun 2020 Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities.

Multi-agent Reinforcement Learning Starcraft

Privileged Information Dropout in Reinforcement Learning

no code implementations19 May 2020 Pierre-Alexandre Kamienny, Kai Arulkumaran, Feryal Behbahani, Wendelin Boehmer, Shimon Whiteson

Using privileged information during training can improve the sample efficiency and performance of machine learning systems.

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

no code implementations11 May 2020 Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.

Question Answering

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

no code implementations22 Apr 2020 Shangtong Zhang, Bo Liu, Shimon Whiteson

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable.

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

1 code implementation19 Mar 2020 Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.

SMAC Starcraft

FACMAC: Factored Multi-Agent Centralised Policy Gradients

2 code implementations14 Mar 2020 Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.

Q-Learning SMAC +2

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Shimon Whiteson

Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution.

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework

no code implementations23 Jan 2020 Guangliang Li, Hamdi Dibeklioğlu, Shimon Whiteson, Hayley Hung

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user.

Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

1 code implementation NeurIPS 2019 Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Policy Gradient Methods

VIABLE: Fast Adaptation via Backpropagating Learned Loss

no code implementations29 Nov 2019 Leo Feng, Luisa Zintgraf, Bei Peng, Shimon Whiteson

In few-shot learning, typically, the loss function which is applied at test time is the one we are ultimately interested in minimising, such as the mean-squared-error loss for a regression problem.

Few-Shot Learning

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

1 code implementation ICML 2020 Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Deep Coordination Graphs

2 code implementations ICML 2020 Wendelin Böhmer, Vitaly Kurin, Shimon Whiteson

This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning.

Multi-agent Reinforcement Learning Q-Learning +2

Can $Q$-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

1 code implementation26 Sep 2019 Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

Feature Engineering Q-Learning

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

1 code implementation23 Sep 2019 Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

Continuous Control Meta Reinforcement Learning

Growing Action Spaces

1 code implementation ICML 2020 Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve

In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress.

Starcraft

A Survey of Reinforcement Learning Informed by Natural Language

no code implementations10 Jun 2019 Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.

Decision Making Hierarchical structure +2

Deep Residual Reinforcement Learning

1 code implementation3 May 2019 Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We revisit residual algorithms in both model-free and model-based reinforcement learning settings.

Model-based Reinforcement Learning

Multitask Soft Option Learning

1 code implementation1 Apr 2019 Maximilian Igl, Andrew Gambardella, Jinke He, Nantas Nardelli, N. Siddharth, Wendelin Böhmer, Shimon Whiteson

We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference.

Transfer Learning

Generalized Off-Policy Actor-Critic

1 code implementation NeurIPS 2019 Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.

Fast Efficient Hyperparameter Tuning for Policy Gradients

1 code implementation18 Feb 2019 Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Meta-Learning Policy Gradient Methods

Stable Opponent Shaping in Differentiable Games

no code implementations ICLR 2019 Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.

Learning from Demonstration in the Wild

no code implementations8 Nov 2018 Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical.

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

1 code implementation4 Nov 2018 Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.

Multi-agent Reinforcement Learning Policy Gradient Methods

VIREL: A Variational Inference Framework for Reinforcement Learning

1 code implementation NeurIPS 2019 Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson

This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps.

Variational Inference

Fast Context Adaptation via Meta-Learning

1 code implementation8 Oct 2018 Luisa M. Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson

We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable.

General Classification Meta-Learning

DiCE: The Infinitely Differentiable Monte Carlo Estimator

1 code implementation ICML 2018 Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

Meta-Learning

Deep Variational Reinforcement Learning for POMDPs

1 code implementation ICML 2018 Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown.

Decision Making

Fingerprint Policy Optimisation for Robust Reinforcement Learning

no code implementations27 May 2018 Supratik Paul, Michael A. Osborne, Shimon Whiteson

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator.

Bayesian Optimisation Continuous Control +1

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

9 code implementations ICML 2018 Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.

Multi-agent Reinforcement Learning Starcraft +1

TACO: Learning Task Decomposition via Temporal Alignment for Control

1 code implementation ICML 2018 Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks.

Fourier Policy Gradients

no code implementations ICML 2018 Matthew Fellows, Kamil Ciosek, Shimon Whiteson

We propose a new way of deriving policy gradient updates for reinforcement learning.

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

5 code implementations14 Feb 2018 Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

Meta-Learning

Expected Policy Gradients for Reinforcement Learning

no code implementations10 Jan 2018 Kamil Ciosek, Shimon Whiteson

For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.

Policy Gradient Methods

Dynamic-Depth Context Tree Weighting

no code implementations NeurIPS 2017 Joao V. Messias, Shimon Whiteson

Reinforcement learning (RL) in partially observable settings is challenging because the agent’s observations are not Markov.

Time Series Time Series Prediction

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

1 code implementation ICLR 2018 Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson

To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions.

Atari Games Value prediction

Learning with Opponent-Learning Awareness

6 code implementations13 Sep 2017 Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL.

Multi-agent Reinforcement Learning

Expected Policy Gradients

no code implementations15 Jun 2017 Kamil Ciosek, Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.

Counterfactual Multi-Agent Policy Gradients

5 code implementations24 May 2017 Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Autonomous Vehicles Starcraft

Multi-Objective Deep Reinforcement Learning

1 code implementation9 Oct 2016 Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson

We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori.

Alternating Optimisation and Quadrature for Robust Control

no code implementations24 May 2016 Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

Bayesian Optimisation

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

no code implementations8 Feb 2016 Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks.

Copeland Dueling Bandits

no code implementations NeurIPS 2015 Masrour Zoghi, Zohar Karnin, Shimon Whiteson, Maarten de Rijke

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist.

A Survey of Multi-Objective Sequential Decision-Making

no code implementations4 Feb 2014 Diederik Marijn Roijers, Peter Vamplew, Shimon Whiteson, Richard Dazeley

Using this taxonomy, we survey the literature on multi-objective methods for planning and learning.

Decision Making

Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

no code implementations4 Feb 2014 Frans Adriaan Oliehoek, Matthijs T. J. Spaan, Christopher Amato, Shimon Whiteson

We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent.

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

no code implementations12 Dec 2013 Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten de Rijke

This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms.

Information Retrieval

Exploiting Agent and Type Independence in Collaborative Graphical Bayesian Games

no code implementations1 Aug 2011 Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan

Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type.

Decision Making

Cannot find the paper you are looking for? You can Submit a new open access paper.