Search Results for author: Jakob Foerster

Found 72 papers, 41 papers with code

“Other-Play” for Zero-Shot Coordination

no code implementations • ICML 2020 • Hengyuan Hu, Alexander Peysakhovich, Adam Lerer, Jakob Foerster

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Paper
Add Code

Foundational Challenges in Assuring Alignment and Safety of Large Language Models

no code implementations • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Yoshua Bengio, Danqi Chen, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).

Paper
Add Code

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

2 code implementations • 10 Apr 2024 • Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer, Christian Schroeder de Witt

In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +2

140

Paper
Code

Policy-Guided Diffusion

1 code implementation • 9 Apr 2024 • Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster

Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.

Paper
Code

JaxUED: A simple and useable UED library in Jax

1 code implementation • 19 Mar 2024 • Samuel Coward, Michael Beukman, Jakob Foerster

We present JaxUED, an open-source library providing minimal dependency implementations of modern Unsupervised Environment Design (UED) algorithms in Jax.

Paper
Code

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

1 code implementation • 26 Feb 2024 • Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster

Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen.

NetHack reinforcement-learning +1

116

Paper
Code

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations • 26 Feb 2024 • Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

Paper
Add Code

Refining Minimax Regret for Unsupervised Environment Design

1 code implementation • 19 Feb 2024 • Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael Dennis, Jakob Foerster

In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation.

Paper
Code

Revisiting Recurrent Reinforcement Learning with Memory Monoids

1 code implementation • 15 Feb 2024 • Steven Morad, Chris Lu, Ryan Kortvelesy, Stephan Liwicki, Jakob Foerster, Amanda Prorok

Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states.

reinforcement-learning

Paper
Code

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

no code implementations • 15 Feb 2024 • Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid

In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies.

Paper
Add Code

Mixtures of Experts Unlock Parameter Scaling for Deep RL

no code implementations • 13 Feb 2024 • Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.

reinforcement-learning Self-Supervised Learning

Paper
Add Code

Analysing the Sample Complexity of Opponent Shaping

no code implementations • 8 Feb 2024 • Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob Foerster

Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging.

Meta Reinforcement Learning

Paper
Add Code

Scaling Opponent Shaping to High Dimensional Games

no code implementations • 19 Dec 2023 • Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes.

Meta-Learning

Paper
Add Code

JAX-LOB: A GPU-Accelerated limit order book simulator to unlock large scale reinforcement learning for trading

no code implementations • 25 Aug 2023 • Sascha Frey, Kang Li, Peer Nagy, Silvia Sapora, Chris Lu, Stefan Zohren, Jakob Foerster, Anisoara Calinescu

Financial exchanges across the world use limit order books (LOBs) to process orders and match trades.

reinforcement-learning Reinforcement Learning (RL)

Paper
Add Code

Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network

no code implementations • 23 Aug 2023 • Peer Nagy, Sascha Frey, Silvia Sapora, Kang Li, Anisoara Calinescu, Stefan Zohren, Jakob Foerster

Overall, our results invite the use and extension of the model in the direction of autoregressive large financial models for the generation of high-frequency financial data and we commit to open-sourcing our code to facilitate future research.

Paper
Add Code

Unbiased Decisions Reduce Regret: Adversarial Domain Adaptation for the Bank Loan Problem

no code implementations • 15 Aug 2023 • Elena Gal, Shaun Singh, Aldo Pacchiano, Ben Walker, Terry Lyons, Jakob Foerster

We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation.

Binary Classification Domain Adaptation +1

Paper
Add Code

Learning Multi-Agent Communication with Contrastive Learning

no code implementations • 3 Jul 2023 • Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory.

Contrastive Learning

Paper
Add Code

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

no code implementations • 26 May 2023 • Paul Barde, Jakob Foerster, Derek Nowrouzezahrai, Amy Zhang

Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences.

Multi-agent Reinforcement Learning

Paper
Add Code

Arbitrary Order Meta-Learning with Simple Population-Based Evolution

no code implementations • 16 Mar 2023 • Chris Lu, Sebastian Towers, Jakob Foerster

Meta-learning, the notion of learning to learn, enables learning systems to quickly and flexibly solve new tasks.

Meta-Learning Time Series +1

Paper
Add Code

Structured State Space Models for In-Context Reinforcement Learning

2 code implementations • NeurIPS 2023 • Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.

Continuous Control Meta-Learning +1

Paper
Code

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

no code implementations • 6 Mar 2023 • Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.

Continuous Control Multi-agent Reinforcement Learning +2

Paper
Add Code

Adversarial Cheap Talk

1 code implementation • 20 Nov 2022 • Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster

More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features.

Meta-Learning Reinforcement Learning (RL)

555

Paper
Code

Perfectly Secure Steganography Using Minimum Entropy Coupling

1 code implementation • 24 Oct 2022 • Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning.

Paper
Code

Equivariant Networks for Zero-Shot Coordination

1 code implementation • 21 Oct 2022 • Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner.

Paper
Code

Discovered Policy Optimisation

1 code implementation • 11 Oct 2022 • Chris Lu, Jakub Grudzien Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster

We refer to the immediate result as Learnt Policy Optimisation (LPO).

Meta-Learning Reinforcement Learning (RL)

555

Paper
Code

Human-AI Coordination via Human-Regularized Search and Learning

no code implementations • 11 Oct 2022 • Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown

First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams.

Paper
Add Code

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar

Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.

Meta-Learning Reinforcement Learning (RL)

Paper
Code

Grounding Aleatoric Uncertainty for Unsupervised Environment Design

1 code implementation • 11 Jul 2022 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.

Reinforcement Learning (RL)

448

Paper
Code

Generalized Beliefs for Cooperative AI

no code implementations • 26 Jun 2022 • Darius Muglich, Luisa Zintgraf, Christian Schroeder de Witt, Shimon Whiteson, Jakob Foerster

Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings.

Paper
Add Code

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

1 code implementation • 20 Jun 2022 • Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster

We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability.

Imitation Learning

239

Paper
Code

Model-Free Opponent Shaping

2 code implementations • 3 May 2022 • Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD).

555

Paper
Code

COLA: Consistent Learning with Opponent-Learning Awareness

1 code implementation • 8 Mar 2022 • Timon Willi, Alistair Letcher, Johannes Treutlein, Jakob Foerster

Finally, in an empirical evaluation on a set of general-sum games, we find that COLA finds prosocial solutions and that it converges under a wider range of learning rates than HOLA and LOLA.

CoLA

Paper
Code

Evolving Curricula with Regret-Based Environment Design

3 code implementations • 2 Mar 2022 • Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.

Reinforcement Learning (RL)

448

Paper
Code

Learning Intuitive Policies Using Action Features

no code implementations • 29 Jan 2022 • Mingwei Ma, Jizhou Liu, Samuel Sokota, Max Kleiman-Weiner, Jakob Foerster

An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations.

Inductive Bias

Paper
Add Code

Mirror Learning: A Unifying Framework of Policy Optimisation

1 code implementation • 7 Jan 2022 • Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster

In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO.

Reinforcement Learning (RL)

Paper
Code

Lyapunov Exponents for Diversity in Differentiable Games

no code implementations • 24 Dec 2021 • Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster

We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points.

Paper
Add Code

Neural Pseudo-Label Optimism for the Bank Loan Problem

no code implementations • NeurIPS 2021 • Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster

The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions.

Decision Making Pseudo Label

Paper
Add Code

Replay-Guided Adversarial Environment Design

4 code implementations • NeurIPS 2021 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.

Reinforcement Learning (RL)

144

Paper
Code

Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers

no code implementations • 26 Jul 2021 • Danielle Rothermel, Margaret Li, Tim Rocktäschel, Jakob Foerster

After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned.

Paper
Add Code

Communicating via Markov Decision Processes

1 code implementation • 17 Jul 2021 • Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster

We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.

Multi-agent Reinforcement Learning

Paper
Code

Centralized Model and Exploration Policy for Multi-Agent RL

1 code implementation • 14 Jul 2021 • Qizhen Zhang, Chris Lu, Animesh Garg, Jakob Foerster

We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty.

Reinforcement Learning (RL)

Paper
Code

Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

no code implementations • 16 Jun 2021 • Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster

Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games.

counterfactual

Paper
Add Code

A New Formalism, Method and Open Issues for Zero-Shot Coordination

1 code implementation • 11 Jun 2021 • Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster

We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game.

Multi-agent Reinforcement Learning

Paper
Code

Quasi-Equivalence Discovery for Zero-Shot Emergent Communication

no code implementations • 14 Mar 2021 • Kalesha Bullard, Douwe Kiela, Franziska Meier, Joelle Pineau, Jakob Foerster

In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i. e., discovering protocols that can generalize to independently trained agents.

Paper
Add Code

Off-Belief Learning

5 code implementations • 6 Mar 2021 • Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.

Paper
Code

Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

no code implementations • NeurIPS 2020 • Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster

In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).

BIG-bench Machine Learning

Paper
Add Code

Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

no code implementations • 29 Oct 2020 • Kalesha Bullard, Franziska Meier, Douwe Kiela, Joelle Pineau, Jakob Foerster

Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels.

Paper
Add Code

The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets

1 code implementation • 23 Sep 2020 • Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom

For neural models to garner widespread public trust and ensure fairness, we must have human-intelligible explanations for their predictions.

Decision Making Fairness

153

Paper
Code

Compositionality and Capacity in Emergent Languages

no code implementations • WS 2020 • Abhinav Gupta, Cinjon Resnick, Jakob Foerster, Andrew Dai, Kyunghyun Cho

Our hypothesis is that there should be a specific range of model capacity and channel bandwidth that induces compositional structure in the resulting language and consequently encourages systematic generalization.

Open-Ended Question Answering Systematic Generalization

Paper
Add Code

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

1 code implementation • 19 Mar 2020 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.

Ranked #6 on SMAC on SMAC 6h_vs_8z

reinforcement-learning Reinforcement Learning (RL) +2

1,718

Paper
Code

"Other-Play" for Zero-Shot Coordination

2 code implementations • 6 Mar 2020 • Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).

Multi-agent Reinforcement Learning

Paper
Code

On the interaction between supervision and self-play in emergent communication

1 code implementation • ICLR 2020 • Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training.

Paper
Code

Improving Policies via Search in Cooperative Partially Observable Games

10 code implementations • 5 Dec 2019 • Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown

The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy.

Game of Hanabi

122

Paper
Code

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning

1 code implementation • NeurIPS 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

Continuous Control Meta Reinforcement Learning +2

Paper
Code

Seeded self-play for language learning

no code implementations • WS 2019 • Abhinav Gupta, Ryan Lowe, Jakob Foerster, Douwe Kiela, Joelle Pineau

Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language.

Imitation Learning Meta-Learning

Paper
Add Code

Capacity, Bandwidth, and Compositionality in Emergent Language Learning

1 code implementation • 24 Oct 2019 • Cinjon Resnick, Abhinav Gupta, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho

In this paper, we investigate the learning biases that affect the efficacy and compositionality of emergent languages.

Open-Ended Question Answering Systematic Generalization

Paper
Code

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

2 code implementations • 4 Oct 2019 • Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom

We aim for this framework to provide a publicly available, off-the-shelf evaluation when the feature-selection perspective on explanations is needed.

feature selection

153

Paper
Code

Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning

no code implementations • 25 Sep 2019 • Christoph Aymanns, Matthias Weber, Co-Pierre Georg, Jakob Foerster

We incorporate fake news into the model by adding an adversarial agent, the attacker, that either provides biased private signals to or takes over a subset of agents.

Multi-agent Reinforcement Learning Q-Learning +2

Paper
Add Code

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

1 code implementation • 23 Sep 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

Continuous Control Meta Reinforcement Learning +2

Paper
Code

A Survey of Reinforcement Learning Informed by Natural Language

no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.

Decision Making Instruction Following +5

Paper
Add Code

Differentiable Game Mechanics

1 code implementation • 13 May 2019 • Alistair Letcher, David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games.

149

Paper
Code

On the Pitfalls of Measuring Emergent Communication

1 code implementation • 12 Mar 2019 • Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin

How do we know if communication is emerging in a multi-agent system?

Fault Detection

Paper
Code

The StarCraft Multi-Agent Challenge

20 code implementations • 11 Feb 2019 • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson

In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.

Ranked #6 on SMAC on SMAC 6h_vs_8z

Benchmarking Reinforcement Learning (RL) +3

1,718

Paper
Code

Stable Opponent Shaping in Differentiable Games

no code implementations • ICLR 2019 • Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.

Paper
Add Code

A Better Baseline for Second Order Gradient Estimation in Stochastic Computation Graphs

no code implementations • 27 Sep 2018 • Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Gregory Farquhar, Maruan Al-Shedivat, Shimon Whiteson

To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.

Meta-Learning Multi-agent Reinforcement Learning +2

Paper
Add Code

DiCE: The Infinitely Differentiable Monte Carlo Estimator

1 code implementation • ICML 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

Meta-Learning

137

Paper
Code

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

16 code implementations • ICML 2018 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.

Ranked #1 on SMAC+ on Off_Near_parallel

Multi-agent Reinforcement Learning reinforcement-learning +4

31,055

Paper
Code

The Mechanics of n-Player Differentiable Games

1 code implementation • ICML 2018 • David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.

149

Paper
Code

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

5 code implementations • 14 Feb 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Meta-Learning

137

Paper
Code

Fake News in Social Networks

no code implementations • 21 Aug 2017 • Christoph Aymanns, Jakob Foerster, Co-Pierre Georg

We model the spread of news as a social learning game on a network.

Paper
Add Code

Counterfactual Multi-Agent Policy Gradients

6 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Ranked #1 on SMAC+ on Off_Superhard_parallel

Autonomous Vehicles counterfactual +2

2,534

Paper
Code

Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning

5 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson

Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.

Multi-agent Reinforcement Learning Q-Learning +3

351

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.