Search Results for author: Shimon Whiteson

Found 121 papers, 66 papers with code

Policy-Guided Diffusion

1 code implementation • 9 Apr 2024 • Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster

Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.

Paper
Code

SplAgger: Split Aggregation for Meta-Reinforcement Learning

no code implementations • 5 Mar 2024 • Jacob Beck, Matthew Jackson, Risto Vuorio, Zheng Xiong, Shimon Whiteson

However, it remains unclear whether task inference sequence models are beneficial even when task inference objectives are not.

Continuous Control Meta Reinforcement Learning +2

Paper
Add Code

Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

no code implementations • 9 Feb 2024 • Zheng Xiong, Risto Vuorio, Jacob Beck, Matthieu Zimmer, Kun Shao, Shimon Whiteson

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies.

Zero-shot Generalization

Paper
Add Code

Discovering Temporally-Aware Reinforcement Learning Algorithms

1 code implementation • 8 Feb 2024 • Matthew Thomas Jackson, Chris Lu, Louis Kirsch, Robert Tjarko Lange, Shimon Whiteson, Jakob Nicolaus Foerster

We propose a simple augmentation to two existing objective discovery approaches that allows the discovered algorithm to dynamically update its objective function throughout the agent's training procedure, resulting in expressive schedules and increased generalization across different training horizons.

Meta-Learning reinforcement-learning

Paper
Code

JaxMARL: Multi-Agent RL Environments in JAX

2 code implementations • 16 Nov 2023 • Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster

This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL.

Meta-Learning Multi-agent Reinforcement Learning +3

324

Paper
Code

Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design

1 code implementation • NeurIPS 2023 • Matthew Thomas Jackson, Minqi Jiang, Jack Parker-Holder, Risto Vuorio, Chris Lu, Gregory Farquhar, Shimon Whiteson, Jakob Nicolaus Foerster

Recently, it has been shown that it is possible to meta-learn update rules, with the hope of discovering algorithms that can perform well on a wide range of RL tasks.

General Reinforcement Learning reinforcement-learning +1

Paper
Code

Recurrent Hypernetworks are Surprisingly Strong in Meta-RL

1 code implementation • NeurIPS 2023 • Jacob Beck, Risto Vuorio, Zheng Xiong, Shimon Whiteson

While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline.

Few-Shot Learning Reinforcement Learning (RL)

Paper
Code

Hierarchical Imitation Learning for Stochastic Environments

no code implementations • 25 Sep 2023 • Maximilian Igl, Punit Shah, Paul Mougin, Sirish Srinivasan, Tarun Gupta, Brandyn White, Kyriacos Shiarlis, Shimon Whiteson

However, such methods are often inappropriate for stochastic environments where the agent must also react to external factors: because agent types are inferred from the observed future trajectory during training, these environments require that the contributions of internal and external factors to the agent behaviour are disentangled and only internal factors, i. e., those under the agent's control, are encoded in the type.

Autonomous Vehicles Imitation Learning

Paper
Add Code

Bayesian Exploration Networks

no code implementations • 24 Aug 2023 • Mattie Fellows, Brandon Kaplowitz, Christian Schroeder de Witt, Shimon Whiteson

Empirical results demonstrate that BEN can learn true Bayes-optimal policies in tasks where existing model-free approaches fail.

Decision Making Decision Making Under Uncertainty +4

Paper
Add Code

The Waymo Open Sim Agents Challenge

1 code implementation • NeurIPS 2023 • Nico Montali, John Lambert, Paul Mougin, Alex Kuefler, Nick Rhinehart, Michelle Li, Cole Gulino, Tristan Emrich, Zoey Yang, Shimon Whiteson, Brandyn White, Dragomir Anguelov

Simulation with realistic, interactive agents represents a key task for autonomous vehicle software development.

Autonomous Driving

Paper
Code

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning

no code implementations • 19 Mar 2023 • Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

By enabling agents to communicate, recent cooperative multi-agent reinforcement learning (MARL) methods have demonstrated better task performance and more coordinated behavior.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Why Target Networks Stabilise Temporal Difference Methods

no code implementations • 24 Feb 2023 • Mattie Fellows, Matthew J. A. Smith, Shimon Whiteson

Integral to recent successes in deep reinforcement learning has been a class of temporal difference methods that use infrequently updated target values for policy evaluation in a Markov Decision Process.

Paper
Add Code

Universal Morphology Control via Contextual Modulation

1 code implementation • 22 Feb 2023 • Zheng Xiong, Jacob Beck, Shimon Whiteson

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and generalization in continuous control.

Continuous Control

Paper
Code

Trust-Region-Free Policy Optimization for Stochastic Policies

no code implementations • 15 Feb 2023 • Mingfei Sun, Benjamin Ellis, Anuj Mahajan, Sam Devlin, Katja Hofmann, Shimon Whiteson

In this paper, we show that the trust region constraint over policies can be safely substituted by a trust-region-free constraint without compromising the underlying monotonic improvement guarantee.

Paper
Add Code

A Survey of Meta-Reinforcement Learning

no code implementations • 19 Jan 2023 • Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson

Meta-RL is most commonly studied in a problem setting where, given a distribution of tasks, the goal is to learn a policy that is capable of adapting to any new task from the task distribution with as little data as possible.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Imitation Is Not Enough: Robustifying Imitation with Reinforcement Learning for Challenging Driving Scenarios

no code implementations • 21 Dec 2022 • Yiren Lu, Justin Fu, George Tucker, Xinlei Pan, Eli Bronstein, Rebecca Roelofs, Benjamin Sapp, Brandyn White, Aleksandra Faust, Shimon Whiteson, Dragomir Anguelov, Sergey Levine

To our knowledge, this is the first application of a combined imitation and reinforcement learning approach in autonomous driving that utilizes large amounts of real-world human driving data.

Autonomous Driving Imitation Learning +2

Paper
Add Code

SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning

1 code implementation • NeurIPS 2023 • Benjamin Ellis, Jonathan Cook, Skander Moalla, Mikayel Samvelyan, Mingfei Sun, Anuj Mahajan, Jakob N. Foerster, Shimon Whiteson

In this work, we conduct new analysis demonstrating that SMAC lacks the stochasticity and partial observability to require complex *closed-loop* policies.

reinforcement-learning SMAC+ +1

166

Paper
Code

Particle-Based Score Estimation for State Space Model Learning in Autonomous Driving

no code implementations • 14 Dec 2022 • Angad Singh, Omar Makhlouf, Maximilian Igl, Joao Messias, Arnaud Doucet, Shimon Whiteson

Recent methods addressing this problem typically differentiate through time in a particle filter, which requires workarounds to the non-differentiable resampling step, that yield biased or high variance gradient estimates.

Autonomous Driving

Paper
Add Code

Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula

no code implementations • 2 Dec 2022 • Eli Bronstein, Sirish Srinivasan, Supratik Paul, Aman Sinha, Matthew O'Kelly, Payam Nikdel, Shimon Whiteson

However, this approach produces agents that do not perform robustly in safety-critical settings, an issue that cannot be addressed by simply adding more data to the training set - we show that an agent trained using only a 10% subset of the data performs just as well as an agent trained on the entire dataset.

Autonomous Driving Imitation Learning +1

Paper
Add Code

Equivariant Networks for Zero-Shot Coordination

1 code implementation • 21 Oct 2022 • Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner.

Paper
Code

Hypernetworks in Meta-Reinforcement Learning

1 code implementation • 20 Oct 2022 • Jacob Beck, Matthew Thomas Jackson, Risto Vuorio, Shimon Whiteson

In this paper, we 1) show that hypernetwork initialization is also a critical factor in meta-RL, and that naive initializations yield poor performance; 2) propose a novel hypernetwork initialization scheme that matches or exceeds the performance of a state-of-the-art approach proposed for supervised settings, as well as being simpler and more general; and 3) use this method to show that hypernetworks can improve performance in meta-RL by evaluating on multiple simulated robotics benchmarks.

Meta Reinforcement Learning reinforcement-learning +1

Paper
Code

Hierarchical Model-Based Imitation Learning for Planning in Autonomous Driving

no code implementations • 18 Oct 2022 • Eli Bronstein, Mark Palatucci, Dominik Notz, Brandyn White, Alex Kuefler, Yiren Lu, Supratik Paul, Payam Nikdel, Paul Mougin, Hongge Chen, Justin Fu, Austin Abrams, Punit Shah, Evan Racah, Benjamin Frenkel, Shimon Whiteson, Dragomir Anguelov

We demonstrate the first large-scale application of model-based generative adversarial imitation learning (MGAIL) to the task of dense urban self-driving.

Autonomous Driving Imitation Learning +1

Paper
Add Code

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar

Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.

Meta-Learning Reinforcement Learning (RL)

Paper
Code

Generalized Beliefs for Cooperative AI

no code implementations • 26 Jun 2022 • Darius Muglich, Luisa Zintgraf, Christian Schroeder de Witt, Shimon Whiteson, Jakob Foerster

Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings.

Paper
Add Code

Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation

no code implementations • 6 May 2022 • Maximilian Igl, Daewoo Kim, Alex Kuefler, Paul Mougin, Punit Shah, Kyriacos Shiarlis, Dragomir Anguelov, Mark Palatucci, Brandyn White, Shimon Whiteson

The beam search refines these policies on the fly by pruning branches that are unfavourably evaluated by a discriminator.

Autonomous Driving

Paper
Add Code

Trust Region Bounds for Decentralized PPO Under Non-stationarity

no code implementations • 31 Jan 2022 • Mingfei Sun, Sam Devlin, Jacob Beck, Katja Hofmann, Shimon Whiteson

We present trust region bounds for optimizing decentralized policies in cooperative Multi-Agent Reinforcement Learning (MARL), which holds even when the transition dynamics are non-stationary.

Multi-agent Reinforcement Learning

Paper
Add Code

You May Not Need Ratio Clipping in PPO

no code implementations • 31 Jan 2022 • Mingfei Sun, Vitaly Kurin, Guoqing Liu, Sam Devlin, Tao Qin, Katja Hofmann, Shimon Whiteson

Furthermore, we show that ESPO can be easily scaled up to distributed training with many workers, delivering strong performance as well.

Continuous Control

Paper
Add Code

Generalization in Cooperative Multi-Agent Systems

no code implementations • 31 Jan 2022 • Anuj Mahajan, Mikayel Samvelyan, Tarun Gupta, Benjamin Ellis, Mingfei Sun, Tim Rocktäschel, Shimon Whiteson

Specifically, we study generalization bounds under a linear dependence of the underlying dynamics on the agent capabilities, which can be seen as a generalization of Successor Features to MAS.

Generalization Bounds Multi-agent Reinforcement Learning

Paper
Add Code

In Defense of the Unitary Scalarization for Deep Multi-Task Learning

1 code implementation • 11 Jan 2022 • Vitaly Kurin, Alessandro De Palma, Ilya Kostrikov, Shimon Whiteson, M. Pawan Kumar

We show that unitary scalarization, coupled with standard regularization and stabilization techniques from single-task learning, matches or improves upon the performance of complex multi-task optimizers in popular supervised and reinforcement learning settings.

Multi-Task Learning Reinforcement Learning (RL)

Paper
Code

Deterministic and Discriminative Imitation (D2-Imitation): Revisiting Adversarial Imitation for Sample Efficiency

1 code implementation • 11 Dec 2021 • Mingfei Sun, Sam Devlin, Katja Hofmann, Shimon Whiteson

Sample efficiency is crucial for imitation learning methods to be applicable in real-world applications.

Imitation Learning

Paper
Code

Regularized Softmax Deep Multi-Agent Q-Learning

1 code implementation • NeurIPS 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Tackling overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting.

Multi-agent Reinforcement Learning Q-Learning +4

Paper
Code

On the Practical Consistency of Meta-Reinforcement Learning Algorithms

no code implementations • 1 Dec 2021 • Zheng Xiong, Luisa Zintgraf, Jacob Beck, Risto Vuorio, Shimon Whiteson

We further find that theoretically inconsistent algorithms can be made consistent by continuing to update all agent components on the OOD tasks, and adapt as well or better than originally consistent ones.

Meta-Learning Meta Reinforcement Learning +3

Paper
Add Code

Reinforcement Learning in Factored Action Spaces using Tensor Decompositions

no code implementations • 27 Oct 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

We present an extended abstract for the previously published work TESSERACT [Mahajan et al., 2021], which proposes a novel solution for Reinforcement Learning (RL) in large, factored action spaces using tensor decompositions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Add Code

Model based Multi-agent Reinforcement Learning with Tensor Decompositions

no code implementations • 27 Oct 2021 • Pascal Van Der Vaart, Anuj Mahajan, Shimon Whiteson

A challenge in multi-agent reinforcement learning is to be able to generalize over intractable state-action spaces.

Model-based Reinforcement Learning Multi-agent Reinforcement Learning +3

Paper
Add Code

Stability and Generalisation in Batch Reinforcement Learning

no code implementations • 29 Sep 2021 • Matthew J. A. Smith, Shimon Whiteson

Overfitting has been recently acknowledged as a key limiting factor in the capabilities of reinforcement learning algorithms, despite little theoretical characterisation.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Truncated Emphatic Temporal Difference Methods for Prediction and Control

1 code implementation • 11 Aug 2021 • Shangtong Zhang, Shimon Whiteson

Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems.

Reinforcement Learning (RL)

3,095

Paper
Code

Communicating via Markov Decision Processes

1 code implementation • 17 Jul 2021 • Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster

We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.

Multi-agent Reinforcement Learning

Paper
Code

Bayesian Bellman Operators

no code implementations • NeurIPS 2021 • Matthew Fellows, Kristian Hartikainen, Shimon Whiteson

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator.

Continuous Control Reinforcement Learning (RL)

Paper
Add Code

SoftDICE for Imitation Learning: Rethinking Off-policy Distribution Matching

no code implementations • 6 Jun 2021 • Mingfei Sun, Anuj Mahajan, Katja Hofmann, Shimon Whiteson

We present SoftDICE, which achieves state-of-the-art performance for imitation learning.

Imitation Learning

Paper
Add Code

Tesseract: Tensorised Actors for Multi-Agent Reinforcement Learning

no code implementations • 31 May 2021 • Anuj Mahajan, Mikayel Samvelyan, Lei Mao, Viktor Makoviychuk, Animesh Garg, Jean Kossaifi, Shimon Whiteson, Yuke Zhu, Animashree Anandkumar

Algorithms derived from Tesseract decompose the Q-tensor across agents and utilise low-rank tensor approximations to model agent interactions relevant to the task.

Learning Theory Multi-agent Reinforcement Learning +3

Paper
Add Code

Semi-On-Policy Training for Sample Efficient Multi-Agent Policy Gradients

no code implementations • 27 Apr 2021 • Bozhidar Vasilev, Tarun Gupta, Bei Peng, Shimon Whiteson

Policy gradient methods are an attractive approach to multi-agent reinforcement learning problems due to their convergence properties and robustness in partially observable scenarios.

Policy Gradient Methods Reinforcement Learning (RL) +2

Paper
Add Code

Regularized Softmax Deep Multi-Agent $Q$-Learning

no code implementations • 22 Mar 2021 • Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

Multi-agent Reinforcement Learning Q-Learning +4

Paper
Add Code

Snowflake: Scaling GNNs to High-Dimensional Continuous Control via Parameter Freezing

1 code implementation • NeurIPS 2021 • Charlie Blake, Vitaly Kurin, Maximilian Igl, Shimon Whiteson

Recent research has shown that graph neural networks (GNNs) can learn policies for locomotion control that are as effective as a typical multi-layer perceptron (MLP), with superior transfer and multi-task performance (Wang et al., 2018; Huang et al., 2020).

Continuous Control Vocal Bursts Intensity Prediction

Paper
Code

Breaking the Deadly Triad with a Target Network

1 code implementation • 21 Jan 2021 • Shangtong Zhang, Hengshuai Yao, Shimon Whiteson

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously.

Q-Learning

3,095

Paper
Code

Deep Interactive Bayesian Reinforcement Learning via Meta-Learning

no code implementations • 11 Jan 2021 • Luisa Zintgraf, Sam Devlin, Kamil Ciosek, Shimon Whiteson, Katja Hofmann

The optimal adaptive behaviour under uncertainty over the other agents' strategies w. r. t.

Meta-Learning reinforcement-learning +1

Paper
Add Code

Average-Reward Off-Policy Policy Evaluation with Function Approximation

1 code implementation • 8 Jan 2021 • Shangtong Zhang, Yi Wan, Richard S. Sutton, Shimon Whiteson

We consider off-policy policy evaluation with function approximation (FA) in average-reward MDPs, where the goal is to estimate both the reward rate and the differential value function.

3,095

Paper
Code

Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

1 code implementation • NeurIPS 2020 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-Q-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

Feature Engineering Q-Learning +1

Paper
Code

Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?

6 code implementations • 18 Nov 2020 • Christian Schroeder de Witt, Tarun Gupta, Denys Makoviichuk, Viktor Makoviychuk, Philip H. S. Torr, Mingfei Sun, Shimon Whiteson

Most recently developed approaches to cooperative multi-agent reinforcement learning in the \emph{centralized training with decentralized execution} setting involve estimating a centralized, joint value function.

reinforcement-learning Reinforcement Learning (RL) +2

163

Paper
Code

UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

no code implementations • 6 Oct 2020 • Tarun Gupta, Anuj Mahajan, Bei Peng, Wendelin Böhmer, Shimon Whiteson

VDN and QMIX are two popular value-based algorithms for cooperative MARL that learn a centralized action value function as a monotonic mixing of per-agent utilities.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Add Code

My Body is a Cage: the Role of Morphology in Graph-Based Incompatible Control

1 code implementation • ICLR 2021 • Vitaly Kurin, Maximilian Igl, Tim Rocktäschel, Wendelin Boehmer, Shimon Whiteson

They also allow practitioners to inject biases encoded in the structure of the input graph.

Continuous Control

Paper
Code

RODE: Learning Roles to Decompose Multi-Agent Tasks

2 code implementations • ICLR 2021 • Tonghan Wang, Tarun Gupta, Anuj Mahajan, Bei Peng, Shimon Whiteson, Chongjie Zhang

Learning a role selector based on action effects makes role discovery much easier because it forms a bi-level learning hierarchy -- the role selector searches in a smaller role space and at a lower temporal resolution, while role policies learn in significantly reduced primitive action-observation spaces.

Clustering Starcraft +1

Paper
Code

A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms

1 code implementation • 2 Oct 2020 • Shangtong Zhang, Romain Laroche, Harm van Seijen, Shimon Whiteson, Remi Tachet des Combes

In the second scenario, we consider optimizing a discounted objective ($\gamma < 1$) and propose to interpret the omission of the discounting in the actor update from an auxiliary task perspective and provide supporting empirical results.

Representation Learning

3,095

Paper
Code

Exploration in Approximate Hyper-State Space for Meta Reinforcement Learning

1 code implementation • 2 Oct 2020 • Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian Hartikainen, Katja Hofmann, Shimon Whiteson

To rapidly learn a new task, it is often essential for agents to explore efficiently -- especially when performance matters from the first timestep.

Meta-Learning Meta Reinforcement Learning +2

Paper
Code

Exploiting Submodular Value Functions For Scaling Up Active Perception

no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Matthijs T. J. Spaan

Furthermore, we show that, under certain conditions, including submodularity, the value function computed using greedy PBVI is guaranteed to have bounded error with respect to the optimal value function.

Paper
Add Code

Real-Time Resource Allocation for Tracking Systems

no code implementations • 21 Sep 2020 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek, Henri Bouma

Automated tracking is key to many computer vision applications.

Paper
Add Code

WordCraft: An Environment for Benchmarking Commonsense Agents

1 code implementation • ICML Workshop LaReL 2020 • Minqi Jiang, Jelena Luketina, Nantas Nardelli, Pasquale Minervini, Philip H. S. Torr, Shimon Whiteson, Tim Rocktäschel

This is partly due to the lack of lightweight simulation environments that sufficiently reflect the semantics of the real world and provide knowledge sources grounded with respect to observations in an RL environment.

Benchmarking Knowledge Graphs +2

Paper
Code

Learning Retrospective Knowledge with Reverse Reinforcement Learning

1 code implementation • NeurIPS 2020 • Shangtong Zhang, Vivek Veeriah, Shimon Whiteson

We present a Reverse Reinforcement Learning (Reverse RL) approach for representing retrospective knowledge.

Anomaly Detection reinforcement-learning +2

3,095

Paper
Code

Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

4 code implementations • NeurIPS 2020 • Tabish Rashid, Gregory Farquhar, Bei Peng, Shimon Whiteson

We show in particular that this projection can fail to recover the optimal policy even with access to $Q^*$, which primarily stems from the equal weighting placed on each joint action.

Multi-agent Reinforcement Learning Q-Learning +3

2,515

Paper
Code

Transient Non-Stationarity and Generalisation in Deep Reinforcement Learning

no code implementations • ICLR 2021 • Maximilian Igl, Gregory Farquhar, Jelena Luketina, Wendelin Boehmer, Shimon Whiteson

Non-stationarity can arise in Reinforcement Learning (RL) even in stationary environments.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Add Code

Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

2 code implementations • 7 Jun 2020 • Shariq Iqbal, Christian A. Schroeder de Witt, Bei Peng, Wendelin Böhmer, Shimon Whiteson, Fei Sha

Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities; however, common patterns of behavior often emerge among these agents/entities.

counterfactual Multi-agent Reinforcement Learning +3

Paper
Code

Privileged Information Dropout in Reinforcement Learning

no code implementations • 19 May 2020 • Pierre-Alexandre Kamienny, Kai Arulkumaran, Feryal Behbahani, Wendelin Boehmer, Shimon Whiteson

Using privileged information during training can improve the sample efficiency and performance of machine learning systems.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Maximizing Information Gain in Partially Observable Environments via Prediction Reward

no code implementations • 11 May 2020 • Yash Satsangi, Sungsu Lim, Shimon Whiteson, Frans Oliehoek, Martha White

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty.

Question Answering Reinforcement Learning (RL)

Paper
Add Code

Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning

1 code implementation • 22 Apr 2020 • Shangtong Zhang, Bo Liu, Shimon Whiteson

We present a mean-variance policy iteration (MVPI) framework for risk-averse control in a discounted infinite horizon MDP optimizing the variance of a per-step reward random variable.

reinforcement-learning Reinforcement Learning (RL)

3,095

Paper
Code

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

1 code implementation • 19 Mar 2020 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.

Ranked #6 on SMAC on SMAC 6h_vs_8z

reinforcement-learning Reinforcement Learning (RL) +2

1,716

Paper
Code

FACMAC: Factored Multi-Agent Centralised Policy Gradients

3 code implementations • NeurIPS 2021 • Bei Peng, Tabish Rashid, Christian A. Schroeder de Witt, Pierre-Alexandre Kamienny, Philip H. S. Torr, Wendelin Böhmer, Shimon Whiteson

We propose FACtored Multi-Agent Centralised policy gradients (FACMAC), a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.

Q-Learning SMAC +2

310

Paper
Code

Optimistic Exploration even with a Pessimistic Initialisation

1 code implementation • ICLR 2020 • Tabish Rashid, Bei Peng, Wendelin Böhmer, Shimon Whiteson

We show that this scheme is provably efficient in the tabular setting and extend it to the deep RL setting.

Efficient Exploration Q-Learning +1

Paper
Code

Reinforcement Learning Enhanced Quantum-inspired Algorithm for Combinatorial Optimization

1 code implementation • 11 Feb 2020 • Dmitrii Beloborodov, A. E. Ulanov, Jakob N. Foerster, Shimon Whiteson, A. I. Lvovsky

Quantum hardware and quantum-inspired algorithms are becoming increasingly popular for combinatorial optimization.

Combinatorial Optimization Hyperparameter Optimization +3

Paper
Code

GradientDICE: Rethinking Generalized Offline Estimation of Stationary Values

1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Shimon Whiteson

Namely, the optimization problem in GenDICE is not a convex-concave saddle-point problem once nonlinearity in optimization variable parameterization is introduced to ensure positivity, so any primal-dual algorithm is not guaranteed to converge or find the desired solution.

3,095

Paper
Code

Facial Feedback for Reinforcement Learning: A Case Study and Offline Analysis Using the TAMER Framework

no code implementations • 23 Jan 2020 • Guangliang Li, Hamdi Dibeklioğlu, Shimon Whiteson, Hayley Hung

Interactive reinforcement learning provides a way for agents to learn to solve tasks from evaluative feedback provided by a human user.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Fast Efficient Hyperparameter Tuning for Policy Gradient Methods

1 code implementation • NeurIPS 2019 • Supratik Paul, Vitaly Kurin, Shimon Whiteson

The main idea is to use existing trajectories sampled by the policy gradient method to optimise a one-step improvement objective, yielding a sample and computationally efficient algorithm that is easy to implement.

Policy Gradient Methods

Paper
Code

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

1 code implementation • NeurIPS 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

Continuous Control Meta Reinforcement Learning +2

Paper
Code

VIABLE: Fast Adaptation via Backpropagating Learned Loss

no code implementations • 29 Nov 2019 • Leo Feng, Luisa Zintgraf, Bei Peng, Shimon Whiteson

In few-shot learning, typically, the loss function which is applied at test time is the one we are ultimately interested in minimising, such as the mean-squared-error loss for a regression problem.

Few-Shot Learning regression

Paper
Add Code

Provably Convergent Two-Timescale Off-Policy Actor-Critic with Function Approximation

1 code implementation • ICML 2020 • Shangtong Zhang, Bo Liu, Hengshuai Yao, Shimon Whiteson

With the help of the emphasis critic and the canonical value function critic, we show convergence for COF-PAC, where the critics are linear and the actor can be nonlinear.

Vocal Bursts Valence Prediction

3,095

Paper
Code

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning

3 code implementations • ICLR 2020 • Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning.

Meta-Learning

276

Paper
Code

MAVEN: Multi-Agent Variational Exploration

4 code implementations • NeurIPS 2019 • Anuj Mahajan, Tabish Rashid, Mikayel Samvelyan, Shimon Whiteson

We specifically focus on QMIX [40], the current state-of-the-art in this domain.

SMAC+

1,310

Paper
Code

Deep Coordination Graphs

2 code implementations • ICML 2020 • Wendelin Böhmer, Vitaly Kurin, Shimon Whiteson

This paper introduces the deep coordination graph (DCG) for collaborative multi-agent reinforcement learning.

Multi-agent Reinforcement Learning Q-Learning +4

708

Paper
Code

Can $Q$-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

2 code implementations • 26 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

While more work is needed to apply Graph-$Q$-SAT to reduce wall clock time in modern SAT solving settings, it is a compelling proof-of-concept showing that RL equipped with Graph Neural Networks can learn a generalizable branching heuristic for SAT search.

Feature Engineering Q-Learning +1

Paper
Code

Improving SAT Solver Heuristics with Graph Networks and Reinforcement Learning

no code implementations • 25 Sep 2019 • Vitaly Kurin, Saad Godil, Shimon Whiteson, Bryan Catanzaro

We present GQSAT, a branching heuristic in a Boolean SAT solver trained with value-based reinforcement learning (RL) using Graph Neural Networks for function approximation.

Feature Engineering reinforcement-learning +1

Paper
Add Code

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

1 code implementation • 23 Sep 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

Continuous Control Meta Reinforcement Learning +2

Paper
Code

Growing Action Spaces

1 code implementation • ICML 2020 • Gregory Farquhar, Laura Gustafson, Zeming Lin, Shimon Whiteson, Nicolas Usunier, Gabriel Synnaeve

In complex tasks, such as those with large combinatorial action spaces, random exploration may be too inefficient to achieve meaningful learning progress.

reinforcement-learning Reinforcement Learning (RL) +1

644

Paper
Code

A Survey of Reinforcement Learning Informed by Natural Language

no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.

Decision Making Instruction Following +5

Paper
Add Code

Exploration with Unreliable Intrinsic Reward in Multi-Agent Reinforcement Learning

no code implementations • 5 Jun 2019 • Wendelin Böhmer, Tabish Rashid, Shimon Whiteson

This paper investigates the use of intrinsic reward to guide exploration in multi-agent reinforcement learning.

Multi-agent Reinforcement Learning Q-Learning +2

Paper
Add Code

Deep Residual Reinforcement Learning

1 code implementation • 3 May 2019 • Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We revisit residual algorithms in both model-free and model-based reinforcement learning settings.

Model-based Reinforcement Learning reinforcement-learning +1

3,095

Paper
Code

DAC: The Double Actor-Critic Architecture for Learning Options

1 code implementation • NeurIPS 2019 • Shangtong Zhang, Shimon Whiteson

We reformulate the option framework as two parallel augmented MDPs.

Transfer Learning

3,095

Paper
Code

Multitask Soft Option Learning

1 code implementation • 1 Apr 2019 • Maximilian Igl, Andrew Gambardella, Jinke He, Nantas Nardelli, N. Siddharth, Wendelin Böhmer, Shimon Whiteson

We present Multitask Soft Option Learning(MSOL), a hierarchical multitask framework based on Planning as Inference.

Transfer Learning

Paper
Code

Generalized Off-Policy Actor-Critic

1 code implementation • NeurIPS 2019 • Shangtong Zhang, Wendelin Boehmer, Shimon Whiteson

We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting.

counterfactual reinforcement-learning +1

3,095

Paper
Code

Fast Efficient Hyperparameter Tuning for Policy Gradients

1 code implementation • 18 Feb 2019 • Supratik Paul, Vitaly Kurin, Shimon Whiteson

Meta-Learning Policy Gradient Methods

Paper
Code

The StarCraft Multi-Agent Challenge

20 code implementations • 11 Feb 2019 • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson

In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.

Ranked #6 on SMAC on SMAC 6h_vs_8z

Benchmarking Reinforcement Learning (RL) +3

1,716

Paper
Code

Stable Opponent Shaping in Differentiable Games

no code implementations • ICLR 2019 • Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.

Paper
Add Code

Learning from Demonstration in the Wild

no code implementations • 8 Nov 2018 • Feryal Behbahani, Kyriacos Shiarlis, Xi Chen, Vitaly Kurin, Sudhanshu Kasewa, Ciprian Stirbu, João Gomes, Supratik Paul, Frans A. Oliehoek, João Messias, Shimon Whiteson

Learning from demonstration (LfD) is useful in settings where hand-coding behaviour or a reward function is impractical.

Paper
Add Code

Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning

1 code implementation • 4 Nov 2018 • Jakob N. Foerster, Francis Song, Edward Hughes, Neil Burch, Iain Dunning, Shimon Whiteson, Matthew Botvinick, Michael Bowling

We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.

Multi-agent Reinforcement Learning Policy Gradient Methods +2

Paper
Code

VIREL: A Variational Inference Framework for Reinforcement Learning

1 code implementation • NeurIPS 2019 • Matthew Fellows, Anuj Mahajan, Tim G. J. Rudner, Shimon Whiteson

This gives VIREL a mode-seeking form of KL divergence, the ability to learn deterministic optimal polices naturally from inference and the ability to optimise value functions and policies in separate, iterative steps.

reinforcement-learning Reinforcement Learning (RL) +1

Paper
Code

Multi-Agent Common Knowledge Reinforcement Learning

1 code implementation • NeurIPS 2019 • Christian A. Schroeder de Witt, Jakob N. Foerster, Gregory Farquhar, Philip H. S. Torr, Wendelin Boehmer, Shimon Whiteson

In this paper, we show that common knowledge between agents allows for complex decentralised coordination.

Multi-agent Reinforcement Learning reinforcement-learning +3

Paper
Code

Fast Context Adaptation via Meta-Learning

1 code implementation • 8 Oct 2018 • Luisa M. Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson

We propose CAVIA for meta-learning, a simple extension to MAML that is less prone to meta-overfitting, easier to parallelise, and more interpretable.

General Classification Meta-Learning +3

134

Paper
Code

CAML: Fast Context Adaptation via Meta-Learning

no code implementations • 27 Sep 2018 • Luisa M Zintgraf, Kyriacos Shiarlis, Vitaly Kurin, Katja Hofmann, Shimon Whiteson

We propose CAML, a meta-learning method for fast adaptation that partitions the model parameters into two parts: context parameters that serve as additional input to the model and are adapted on individual tasks, and shared parameters that are meta-trained and shared across tasks.

Meta-Learning

Paper
Add Code

A Better Baseline for Second Order Gradient Estimation in Stochastic Computation Graphs

no code implementations • 27 Sep 2018 • Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Gregory Farquhar, Maruan Al-Shedivat, Shimon Whiteson

To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.

Meta-Learning Multi-agent Reinforcement Learning +2

Paper
Add Code

DiCE: The Infinitely Differentiable Monte Carlo Estimator

1 code implementation • ICML 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

Meta-Learning

137

Paper
Code

Deep Variational Reinforcement Learning for POMDPs

1 code implementation • ICML 2018 • Maximilian Igl, Luisa Zintgraf, Tuan Anh Le, Frank Wood, Shimon Whiteson

Many real-world sequential decision making problems are partially observable by nature, and the environment model is typically unknown.

Decision Making Inductive Bias +2

130

Paper
Code

Fingerprint Policy Optimisation for Robust Reinforcement Learning

no code implementations • 27 May 2018 • Supratik Paul, Michael A. Osborne, Shimon Whiteson

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator.

Bayesian Optimisation Continuous Control +3

Paper
Add Code

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

16 code implementations • ICML 2018 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.

Ranked #1 on SMAC+ on Off_Near_parallel

Multi-agent Reinforcement Learning reinforcement-learning +4

30,998

Paper
Code

TACO: Learning Task Decomposition via Temporal Alignment for Control

1 code implementation • ICML 2018 • Kyriacos Shiarlis, Markus Wulfmeier, Sasha Salter, Shimon Whiteson, Ingmar Posner

Many advanced Learning from Demonstration (LfD) methods consider the decomposition of complex, real-world tasks into simpler sub-tasks.

Paper
Code

Fourier Policy Gradients

no code implementations • ICML 2018 • Matthew Fellows, Kamil Ciosek, Shimon Whiteson

We propose a new way of deriving policy gradient updates for reinforcement learning.

Reinforcement Learning (RL)

Paper
Add Code

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

5 code implementations • 14 Feb 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Meta-Learning

137

Paper
Code

Expected Policy Gradients for Reinforcement Learning

no code implementations • 10 Jan 2018 • Kamil Ciosek, Shimon Whiteson

For Gaussian policies, we introduce an exploration method that uses covariance proportional to the matrix exponential of the scaled Hessian of the critic with respect to the actions.

Policy Gradient Methods reinforcement-learning +1

Paper
Add Code

Dynamic-Depth Context Tree Weighting

no code implementations • NeurIPS 2017 • Joao V. Messias, Shimon Whiteson

Reinforcement learning (RL) in partially observable settings is challenging because the agent’s observations are not Markov.

Reinforcement Learning (RL) Time Series +1

Paper
Add Code

TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning

1 code implementation • ICLR 2018 • Gregory Farquhar, Tim Rocktäschel, Maximilian Igl, Shimon Whiteson

To address these challenges, we propose TreeQN, a differentiable, recursive, tree-structured model that serves as a drop-in replacement for any value function network in deep RL with discrete actions.

Atari Games reinforcement-learning +2

Paper
Code

Learning with Opponent-Learning Awareness

6 code implementations • 13 Sep 2017 • Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL.

Multi-agent Reinforcement Learning

137

Paper
Code

Expected Policy Gradients

no code implementations • 15 Jun 2017 • Kamil Ciosek, Shimon Whiteson

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning.

Paper
Add Code

Counterfactual Multi-Agent Policy Gradients

6 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Ranked #1 on SMAC+ on Off_Superhard_parallel

Autonomous Vehicles counterfactual +2

2,515

Paper
Code

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

5 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.

Multi-agent Reinforcement Learning Q-Learning +3

350

Paper
Code

LipNet: End-to-End Sentence-level Lipreading

12 code implementations • 5 Nov 2016 • Yannis M. Assael, Brendan Shillingford, Shimon Whiteson, Nando de Freitas

Lipreading is the task of decoding text from the movement of a speaker's mouth.

Ranked #5 on Lipreading on GRID corpus (mixed-speech)

General Classification Lipreading +1

621

Paper
Code

Multi-Objective Deep Reinforcement Learning

2 code implementations • 9 Oct 2016 • Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, Shimon Whiteson

We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori.

Multi-Objective Reinforcement Learning reinforcement-learning

220

Paper
Code

Alternating Optimisation and Quadrature for Robust Control

no code implementations • 24 May 2016 • Supratik Paul, Konstantinos Chatzilygeroudis, Kamil Ciosek, Jean-Baptiste Mouret, Michael A. Osborne, Shimon Whiteson

ALOQ is robust to the presence of significant rare events, which may not be observable under random sampling, but play a substantial role in determining the optimal policy.

Bayesian Optimisation

Paper
Add Code

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

3 code implementations • NeurIPS 2016 • Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility.

Multi-agent Reinforcement Learning Q-Learning +2

434

Paper
Code

Probably Approximately Correct Greedy Maximization with Efficient Bounds on Information Gain for Sensor Selection

no code implementations • 25 Feb 2016 • Yash Satsangi, Shimon Whiteson, Frans A. Oliehoek

Submodular function maximization finds application in a variety of real-world decision-making problems.

Decision Making

Paper
Add Code

Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks

no code implementations • 8 Feb 2016 • Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

We propose deep distributed recurrent Q-networks (DDRQN), which enable teams of agents to learn to solve communication-based coordination tasks.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Copeland Dueling Bandits

no code implementations • NeurIPS 2015 • Masrour Zoghi, Zohar Karnin, Shimon Whiteson, Maarten de Rijke

A version of the dueling bandit problem is addressed in which a Condorcet winner may not exist.

Paper
Add Code

Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs

no code implementations • 4 Feb 2014 • Frans Adriaan Oliehoek, Matthijs T. J. Spaan, Christopher Amato, Shimon Whiteson

We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent.

Clustering

Paper
Add Code

A Survey of Multi-Objective Sequential Decision-Making

no code implementations • 4 Feb 2014 • Diederik Marijn Roijers, Peter Vamplew, Shimon Whiteson, Richard Dazeley

Using this taxonomy, we survey the literature on multi-objective methods for planning and learning.

Decision Making

Paper
Add Code

Relative Upper Confidence Bound for the K-Armed Dueling Bandit Problem

no code implementations • 12 Dec 2013 • Masrour Zoghi, Shimon Whiteson, Remi Munos, Maarten de Rijke

This paper proposes a new method for the K-armed dueling bandit problem, a variation on the regular K-armed bandit problem that offers only relative feedback about pairs of arms.

Information Retrieval Retrieval

Paper
Add Code

Exploiting Agent and Type Independence in Collaborative Graphical Bayesian Games

no code implementations • 1 Aug 2011 • Frans A. Oliehoek, Shimon Whiteson, Matthijs T. J. Spaan

Such problems can be modeled as collaborative Bayesian games in which each agent receives private information in the form of its type.

Decision Making Vocal Bursts Type Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.