Search Results for author: Jakob Foerster

Found 72 papers, 41 papers with code

“Other-Play” for Zero-Shot Coordination

no code implementations ICML 2020 Hengyuan Hu, Alexander Peysakhovich, Adam Lerer, Jakob Foerster

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Rethinking Out-of-Distribution Detection for Reinforcement Learning: Advancing Methods for Evaluation and Detection

2 code implementations10 Apr 2024 Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer, Christian Schroeder de Witt

In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments.

Out-of-Distribution Detection Out of Distribution (OOD) Detection +2

Policy-Guided Diffusion

1 code implementation9 Apr 2024 Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster

Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.

JaxUED: A simple and useable UED library in Jax

1 code implementation19 Mar 2024 Samuel Coward, Michael Beukman, Jakob Foerster

We present JaxUED, an open-source library providing minimal dependency implementations of modern Unsupervised Environment Design (UED) algorithms in Jax.

Craftax: A Lightning-Fast Benchmark for Open-Ended Reinforcement Learning

1 code implementation26 Feb 2024 Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster

Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen.

NetHack reinforcement-learning +1

Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts

no code implementations26 Feb 2024 Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu

As large language models (LLMs) become increasingly prevalent across many real-world applications, understanding and enhancing their robustness to user inputs is of paramount importance.

Question Answering

Refining Minimax Regret for Unsupervised Environment Design

1 code implementation19 Feb 2024 Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael Dennis, Jakob Foerster

In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation.

Revisiting Recurrent Reinforcement Learning with Memory Monoids

1 code implementation15 Feb 2024 Steven Morad, Chris Lu, Ryan Kortvelesy, Stephan Liwicki, Jakob Foerster, Amanda Prorok

Memory models such as Recurrent Neural Networks (RNNs) and Transformers address Partially Observable Markov Decision Processes (POMDPs) by mapping trajectories to latent Markov states.

reinforcement-learning

Symmetry-Breaking Augmentations for Ad Hoc Teamwork

no code implementations15 Feb 2024 Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid

In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies.

Mixtures of Experts Unlock Parameter Scaling for Deep RL

no code implementations13 Feb 2024 Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.

reinforcement-learning Self-Supervised Learning

Analysing the Sample Complexity of Opponent Shaping

no code implementations8 Feb 2024 Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob Foerster

Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging.

Meta Reinforcement Learning

Scaling Opponent Shaping to High Dimensional Games

no code implementations19 Dec 2023 Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes.

Meta-Learning

Generative AI for End-to-End Limit Order Book Modelling: A Token-Level Autoregressive Generative Model of Message Flow Using a Deep State Space Network

no code implementations23 Aug 2023 Peer Nagy, Sascha Frey, Silvia Sapora, Kang Li, Anisoara Calinescu, Stefan Zohren, Jakob Foerster

Overall, our results invite the use and extension of the model in the direction of autoregressive large financial models for the generation of high-frequency financial data and we commit to open-sourcing our code to facilitate future research.

Learning Multi-Agent Communication with Contrastive Learning

no code implementations3 Jul 2023 Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch

By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory.

Contrastive Learning

A Model-Based Solution to the Offline Multi-Agent Reinforcement Learning Coordination Problem

no code implementations26 May 2023 Paul Barde, Jakob Foerster, Derek Nowrouzezahrai, Amy Zhang

Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences.

Multi-agent Reinforcement Learning

Arbitrary Order Meta-Learning with Simple Population-Based Evolution

no code implementations16 Mar 2023 Chris Lu, Sebastian Towers, Jakob Foerster

Meta-learning, the notion of learning to learn, enables learning systems to quickly and flexibly solve new tasks.

Meta-Learning Time Series +1

Structured State Space Models for In-Context Reinforcement Learning

2 code implementations NeurIPS 2023 Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani

We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.

Continuous Control Meta-Learning +1

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

no code implementations6 Mar 2023 Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel

Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.

Continuous Control Multi-agent Reinforcement Learning +2

Adversarial Cheap Talk

1 code implementation20 Nov 2022 Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster

More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features.

Meta-Learning Reinforcement Learning (RL)

Perfectly Secure Steganography Using Minimum Entropy Coupling

1 code implementation24 Oct 2022 Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier

Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning.

Equivariant Networks for Zero-Shot Coordination

1 code implementation21 Oct 2022 Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster

Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner.

Human-AI Coordination via Human-Regularized Search and Learning

no code implementations11 Oct 2022 Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown

First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams.

An Investigation of the Bias-Variance Tradeoff in Meta-Gradients

1 code implementation22 Sep 2022 Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar

Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.

Meta-Learning Reinforcement Learning (RL)

Grounding Aleatoric Uncertainty for Unsupervised Environment Design

1 code implementation11 Jul 2022 Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster

Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.

Reinforcement Learning (RL)

Generalized Beliefs for Cooperative AI

no code implementations26 Jun 2022 Darius Muglich, Luisa Zintgraf, Christian Schroeder de Witt, Shimon Whiteson, Jakob Foerster

Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings.

Nocturne: a scalable driving benchmark for bringing multi-agent learning one step closer to the real world

1 code implementation20 Jun 2022 Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster

We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability.

Imitation Learning

Model-Free Opponent Shaping

2 code implementations3 May 2022 Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD).

COLA: Consistent Learning with Opponent-Learning Awareness

1 code implementation8 Mar 2022 Timon Willi, Alistair Letcher, Johannes Treutlein, Jakob Foerster

Finally, in an empirical evaluation on a set of general-sum games, we find that COLA finds prosocial solutions and that it converges under a wider range of learning rates than HOLA and LOLA.

CoLA

Evolving Curricula with Regret-Based Environment Design

3 code implementations2 Mar 2022 Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.

Reinforcement Learning (RL)

Learning Intuitive Policies Using Action Features

no code implementations29 Jan 2022 Mingwei Ma, Jizhou Liu, Samuel Sokota, Max Kleiman-Weiner, Jakob Foerster

An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations.

Inductive Bias

Mirror Learning: A Unifying Framework of Policy Optimisation

1 code implementation7 Jan 2022 Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster

In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO.

Reinforcement Learning (RL)

Lyapunov Exponents for Diversity in Differentiable Games

no code implementations24 Dec 2021 Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster

We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points.

Neural Pseudo-Label Optimism for the Bank Loan Problem

no code implementations NeurIPS 2021 Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster

The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions.

Decision Making Pseudo Label

Replay-Guided Adversarial Environment Design

4 code implementations NeurIPS 2021 Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.

Reinforcement Learning (RL)

Don't Sweep your Learning Rate under the Rug: A Closer Look at Cross-modal Transfer of Pretrained Transformers

no code implementations26 Jul 2021 Danielle Rothermel, Margaret Li, Tim Rocktäschel, Jakob Foerster

After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned.

Communicating via Markov Decision Processes

1 code implementation17 Jul 2021 Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster

We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.

Multi-agent Reinforcement Learning

Centralized Model and Exploration Policy for Multi-Agent RL

1 code implementation14 Jul 2021 Qizhen Zhang, Chris Lu, Animesh Garg, Jakob Foerster

We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty.

Reinforcement Learning (RL)

Learned Belief Search: Efficiently Improving Policies in Partially Observable Settings

no code implementations16 Jun 2021 Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster

Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games.

counterfactual

A New Formalism, Method and Open Issues for Zero-Shot Coordination

1 code implementation11 Jun 2021 Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster

We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game.

Multi-agent Reinforcement Learning

Quasi-Equivalence Discovery for Zero-Shot Emergent Communication

no code implementations14 Mar 2021 Kalesha Bullard, Douwe Kiela, Franziska Meier, Joelle Pineau, Jakob Foerster

In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i. e., discovering protocols that can generalize to independently trained agents.

Off-Belief Learning

5 code implementations6 Mar 2021 Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.

Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

no code implementations NeurIPS 2020 Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster

In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).

BIG-bench Machine Learning

Exploring Zero-Shot Emergent Communication in Embodied Multi-Agent Populations

no code implementations29 Oct 2020 Kalesha Bullard, Franziska Meier, Douwe Kiela, Joelle Pineau, Jakob Foerster

Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels.

The Struggles of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets

1 code implementation23 Sep 2020 Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom

For neural models to garner widespread public trust and ensure fairness, we must have human-intelligible explanations for their predictions.

Decision Making Fairness

Compositionality and Capacity in Emergent Languages

no code implementations WS 2020 Abhinav Gupta, Cinjon Resnick, Jakob Foerster, Andrew Dai, Kyunghyun Cho

Our hypothesis is that there should be a specific range of model capacity and channel bandwidth that induces compositional structure in the resulting language and consequently encourages systematic generalization.

Open-Ended Question Answering Systematic Generalization

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

1 code implementation19 Mar 2020 Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.

reinforcement-learning Reinforcement Learning (RL) +2

"Other-Play" for Zero-Shot Coordination

2 code implementations6 Mar 2020 Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster

We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).

Multi-agent Reinforcement Learning

On the interaction between supervision and self-play in emergent communication

1 code implementation ICLR 2020 Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau

A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training.

Improving Policies via Search in Cooperative Partially Observable Games

10 code implementations5 Dec 2019 Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown

The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy.

Game of Hanabi

Seeded self-play for language learning

no code implementations WS 2019 Abhinav Gupta, Ryan Lowe, Jakob Foerster, Douwe Kiela, Joelle Pineau

Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language.

Imitation Learning Meta-Learning

Can I Trust the Explainer? Verifying Post-hoc Explanatory Methods

2 code implementations4 Oct 2019 Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom

We aim for this framework to provide a publicly available, off-the-shelf evaluation when the feature-selection perspective on explanations is needed.

feature selection

Modeling Fake News in Social Networks with Deep Multi-Agent Reinforcement Learning

no code implementations25 Sep 2019 Christoph Aymanns, Matthias Weber, Co-Pierre Georg, Jakob Foerster

We incorporate fake news into the model by adding an adversarial agent, the attacker, that either provides biased private signals to or takes over a subset of agents.

Multi-agent Reinforcement Learning Q-Learning +2

Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning

1 code implementation23 Sep 2019 Gregory Farquhar, Shimon Whiteson, Jakob Foerster

Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.

Continuous Control Meta Reinforcement Learning +2

A Survey of Reinforcement Learning Informed by Natural Language

no code implementations10 Jun 2019 Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel

To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.

Decision Making Instruction Following +5

Differentiable Game Mechanics

1 code implementation13 May 2019 Alistair Letcher, David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games.

Stable Opponent Shaping in Differentiable Games

no code implementations ICLR 2019 Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson

A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.

DiCE: The Infinitely Differentiable Monte Carlo Estimator

1 code implementation ICML 2018 Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

Meta-Learning

QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning

16 code implementations ICML 2018 Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.

Multi-agent Reinforcement Learning reinforcement-learning +4

The Mechanics of n-Player Differentiable Games

1 code implementation ICML 2018 David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel

The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.

DiCE: The Infinitely Differentiable Monte-Carlo Estimator

5 code implementations14 Feb 2018 Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson

Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.

Meta-Learning

Fake News in Social Networks

no code implementations21 Aug 2017 Christoph Aymanns, Jakob Foerster, Co-Pierre Georg

We model the spread of news as a social learning game on a network.

Counterfactual Multi-Agent Policy Gradients

6 code implementations24 May 2017 Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson

COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.

Autonomous Vehicles counterfactual +2

Cannot find the paper you are looking for? You can Submit a new open access paper.