no code implementations • ICML 2020 • Hengyuan Hu, Alexander Peysakhovich, Adam Lerer, Jakob Foerster
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).
Multi-agent Reinforcement Learning
Reinforcement Learning (RL)
no code implementations • 22 Dec 2024 • Benjamin Ellis, Matthew T. Jackson, Andrei Lupu, Alexander D. Goldie, Mattie Fellows, Shimon Whiteson, Jakob Foerster
In this paper, we take a different approach and instead address the effect of nonstationarity by adapting the widely used Adam optimiser.
1 code implementation • 13 Dec 2024 • Branton DeMoss, Silvia Sapora, Jakob Foerster, Nick Hawes, Ingmar Posner
We investigate the phenomenon of generalization through the lens of compression.
1 code implementation • 7 Nov 2024 • Usman Anwar, Ashish Pandian, Jia Wan, David Krueger, Jakob Foerster
We show that with NZSC training, RL agents can be trained to coordinate well with novel partners even when the (exact) problem setting of the coordination is not common knowledge.
1 code implementation • 30 Oct 2024 • Michael Matthews, Michael Beukman, Chris Lu, Jakob Foerster
While large models trained with self-supervised learning on offline datasets have shown remarkable capabilities in text and image domains, achieving the same generalisation for agents that act in sequential decision problems remains an open challenge.
General Reinforcement Learning
Reinforcement Learning (RL)
+1
1 code implementation • 28 Oct 2024 • Lize Alberts, Benjamin Ellis, Andrei Lupu, Jakob Foerster
We introduce a multi-turn benchmark for evaluating personalised alignment in LLM-based AI assistants, focusing on their ability to handle user-provided safety-critical contexts.
no code implementations • 4 Oct 2024 • Jonathan Cook, Tim Rocktäschel, Jakob Foerster, Dennis Aumiller, Alex Wang
We then show that STICK (Self-TICK) can be used to improve generation quality across multiple benchmarks via self-refinement and Best-of-N selection.
no code implementations • 12 Sep 2024 • Alisia Lupidi, Carlos Gemmell, Nicola Cancedda, Jane Dwivedi-Yu, Jason Weston, Jakob Foerster, Roberta Raileanu, Maria Lomeli
Our method improves performance by 25. 51% for TQA on WikiSQL and 22. 57% for MHQA on HotPotQA compared to the fine-tuned baselines.
1 code implementation • 1 Sep 2024 • Chris Lu, Michael Beukman, Michael Matthews, Jakob Foerster
Towards this, we present JaxLife: an artificial life simulator in which embodied agents, parameterized by deep neural networks, must learn to survive in an expressive world containing programmable systems.
1 code implementation • 27 Aug 2024 • Alexander Rutherford, Michael Beukman, Timon Willi, Bruno Lacerda, Nick Hawes, Jakob Foerster
What data or environments to use for training to improve downstream performance is a longstanding and very topical question in reinforcement learning.
no code implementations • 15 Aug 2024 • Qizhen Zhang, Nikolas Gritsch, Dwaraknath Gnaneshwar, Simon Guo, David Cairuz, Bharat Venkitesh, Jakob Foerster, Phil Blunsom, Sebastian Ruder, Ahmet Ustun, Acyr Locatelli
BAM makes full use of specialized dense models by not only using their FFN to initialize the MoE layers but also leveraging experts' attention parameters fully by initializing them into a soft-variant of Mixture of Attention (MoA) layers.
2 code implementations • 12 Aug 2024 • Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, David Ha
This approach signifies the beginning of a new era in scientific discovery in machine learning: bringing the transformative benefits of AI agents to the entire research process of AI itself, and taking us closer to a world where endless affordable creativity and innovation can be unleashed on the world's most challenging problems.
no code implementations • 26 Jun 2024 • Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro
Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity.
1 code implementation • 21 Jun 2024 • Andrei Lupu, Chris Lu, Jarek Liesen, Robert Tjarko Lange, Jakob Foerster
Filling the gap, we formalize behaviour distillation, a setting that aims to discover and then condense the information required for training an expert policy into a synthetic dataset of state-action pairs, without access to expert data.
3 code implementations • 12 Jun 2024 • Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange
Specifically, we iteratively prompt an LLM to propose and implement new preference optimization loss functions based on previously-evaluated performance metrics.
1 code implementation • 1 Jun 2024 • Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster
Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history.
no code implementations • 14 May 2024 • Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Aaron Purewal, Csaba Botos, Fabro Steibel, FAZEL KESHTKAR, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Imperial, Juan Arturo Nolazco, Lori Landay, Matthew Jackson, Phillip H. S. Torr, Trevor Darrell, Yong Lee, Jakob Foerster
Applications of Generative AI (Gen AI) are expected to revolutionize a number of different areas, ranging from science & medicine to education.
1 code implementation • 13 May 2024 • Ziyang Zhang, Qizhen Zhang, Jakob Foerster
A promising approach is to use the LLM itself as the safeguard.
no code implementations • 6 May 2024 • Tim Franzmeyer, Edith Elkind, Philip Torr, Jakob Foerster, Joao Henriques
Desired characteristics for an AI agent can be expressed by assigning desirability scores, which we assume are not assigned to individual behaviors but to collective trajectories.
no code implementations • 25 Apr 2024 • Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder de Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Jackson, Paul Röttger, Philip H. S. Torr, Trevor Darrell, Yong Suk Lee, Jakob Foerster
In the next few years, applications of Generative AI are expected to revolutionize a number of different areas, ranging from science & medicine to education.
1 code implementation • 15 Apr 2024 • Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Zhang, Ruiqi Zhong, Seán Ó hÉigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwan, Yoshua Bengio, Danqi Chen, Philip H. S. Torr, Samuel Albanie, Tegan Maharaj, Jakob Foerster, Florian Tramer, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
This work identifies 18 foundational challenges in assuring the alignment and safety of large language models (LLMs).
1 code implementation • 10 Apr 2024 • Linas Nasvytis, Kai Sandbrink, Jakob Foerster, Tim Franzmeyer, Christian Schroeder de Witt
In this paper, we study the problem of out-of-distribution (OOD) detection in RL, which focuses on identifying situations at test time that RL agents have not encountered in their training environments.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
+3
1 code implementation • 9 Apr 2024 • Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster
Our approach provides an effective alternative to autoregressive offline world models, opening the door to the controllable generation of synthetic training data.
1 code implementation • 19 Mar 2024 • Samuel Coward, Michael Beukman, Jakob Foerster
We present JaxUED, an open-source library providing minimal dependency implementations of modern Unsupervised Environment Design (UED) algorithms in Jax.
1 code implementation • 26 Feb 2024 • Michael Matthews, Michael Beukman, Benjamin Ellis, Mikayel Samvelyan, Matthew Jackson, Samuel Coward, Jakob Foerster
Either they are too slow for meaningful research to be performed without enormous computational resources, like Crafter, NetHack and Minecraft, or they are not complex enough to pose a significant challenge, like Minigrid and Procgen.
no code implementations • 26 Feb 2024 • Mikayel Samvelyan, Sharath Chandra Raparthy, Andrei Lupu, Eric Hambro, Aram H. Markosyan, Manish Bhatt, Yuning Mao, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Tim Rocktäschel, Roberta Raileanu
Rainbow Teaming casts adversarial prompt generation as a quality-diversity problem and uses open-ended search to generate prompts that are both effective and diverse.
1 code implementation • 19 Feb 2024 • Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael Dennis, Jakob Foerster
In this work, we introduce Bayesian level-perfect MMR (BLP), a refinement of the minimax regret objective that overcomes this limitation.
2 code implementations • 19 Feb 2024 • Anya Sims, Cong Lu, Jakob Foerster, Yee Whye Teh
We show that this truncation of rollouts results in a set of edge-of-reach states at which we are effectively ``bootstrapping from the void.''
Model-based Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 15 Feb 2024 • Steven Morad, Chris Lu, Ryan Kortvelesy, Stephan Liwicki, Jakob Foerster, Amanda Prorok
We leverage memoroids to propose a batching method that improves sample efficiency, increases the return, and simplifies the implementation of recurrent loss functions in reinforcement learning.
no code implementations • 15 Feb 2024 • Ravi Hammond, Dustin Craggs, Mingyu Guo, Jakob Foerster, Ian Reid
In many collaborative settings, artificial intelligence (AI) agents must be able to adapt to new teammates that use unknown or previously unobserved strategies.
1 code implementation • 13 Feb 2024 • Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.
no code implementations • 8 Feb 2024 • Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob Foerster
Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging.
no code implementations • 19 Dec 2023 • Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes.
no code implementations • 25 Aug 2023 • Sascha Frey, Kang Li, Peer Nagy, Silvia Sapora, Chris Lu, Stefan Zohren, Jakob Foerster, Anisoara Calinescu
Financial exchanges across the world use limit order books (LOBs) to process orders and match trades.
1 code implementation • 23 Aug 2023 • Peer Nagy, Sascha Frey, Silvia Sapora, Kang Li, Anisoara Calinescu, Stefan Zohren, Jakob Foerster
Overall, our results invite the use and extension of the model in the direction of autoregressive large financial models for the generation of high-frequency financial data and we commit to open-sourcing our code to facilitate future research.
no code implementations • 15 Aug 2023 • Elena Gal, Shaun Singh, Aldo Pacchiano, Ben Walker, Terry Lyons, Jakob Foerster
We introduce adversarial optimism (AdOpt) to directly address bias in the training set using adversarial domain adaptation.
no code implementations • 3 Jul 2023 • Yat Long Lo, Biswa Sengupta, Jakob Foerster, Michael Noukhovitch
By examining the relationship between messages sent and received, we propose to learn to communicate using contrastive learning to maximize the mutual information between messages of a given trajectory.
no code implementations • 26 May 2023 • Paul Barde, Jakob Foerster, Derek Nowrouzezahrai, Amy Zhang
Training multiple agents to coordinate is an essential problem with applications in robotics, game theory, economics, and social sciences.
no code implementations • 16 Mar 2023 • Chris Lu, Sebastian Towers, Jakob Foerster
Meta-learning, the notion of learning to learn, enables learning systems to quickly and flexibly solve new tasks.
2 code implementations • NeurIPS 2023 • Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, Feryal Behbahani
We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel, allowing us to tackle reinforcement learning tasks.
no code implementations • 6 Mar 2023 • Mikayel Samvelyan, Akbir Khan, Michael Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Foerster, Roberta Raileanu, Tim Rocktäschel
Open-ended learning methods that automatically generate a curriculum of increasingly challenging tasks serve as a promising avenue toward generally capable reinforcement learning agents.
1 code implementation • 20 Nov 2022 • Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster
More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features.
2 code implementations • 24 Oct 2022 • Christian Schroeder de Witt, Samuel Sokota, J. Zico Kolter, Jakob Foerster, Martin Strohmeier
Steganography is the practice of encoding secret information into innocuous content in such a manner that an adversarial third party would not realize that there is hidden meaning.
1 code implementation • 21 Oct 2022 • Darius Muglich, Christian Schroeder de Witt, Elise van der Pol, Shimon Whiteson, Jakob Foerster
Successful coordination in Dec-POMDPs requires agents to adopt robust strategies and interpretable styles of play for their partner.
no code implementations • 11 Oct 2022 • Hengyuan Hu, David J Wu, Adam Lerer, Jakob Foerster, Noam Brown
First, we show that our method outperforms experts when playing with a group of diverse human players in ad-hoc teams.
1 code implementation • 11 Oct 2022 • Chris Lu, Jakub Grudzien Kuba, Alistair Letcher, Luke Metz, Christian Schroeder de Witt, Jakob Foerster
We refer to the immediate result as Learnt Policy Optimisation (LPO).
1 code implementation • 22 Sep 2022 • Risto Vuorio, Jacob Beck, Shimon Whiteson, Jakob Foerster, Gregory Farquhar
Meta-gradients provide a general approach for optimizing the meta-parameters of reinforcement learning (RL) algorithms.
1 code implementation • 11 Jul 2022 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
Problematically, in partially-observable or stochastic settings, optimal policies may depend on the ground-truth distribution over aleatoric parameters of the environment in the intended deployment setting, while curriculum learning necessarily shifts the training distribution.
no code implementations • 26 Jun 2022 • Darius Muglich, Luisa Zintgraf, Christian Schroeder de Witt, Shimon Whiteson, Jakob Foerster
Self-play is a common paradigm for constructing solutions in Markov games that can yield optimal policies in collaborative settings.
1 code implementation • 20 Jun 2022 • Eugene Vinitsky, Nathan Lichtlé, Xiaomeng Yang, Brandon Amos, Jakob Foerster
We introduce Nocturne, a new 2D driving simulator for investigating multi-agent coordination under partial observability.
2 code implementations • 3 May 2022 • Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD).
1 code implementation • 8 Mar 2022 • Timon Willi, Alistair Letcher, Johannes Treutlein, Jakob Foerster
Finally, in an empirical evaluation on a set of general-sum games, we find that COLA finds prosocial solutions and that it converges under a wider range of learning rates than HOLA and LOLA.
3 code implementations • 2 Mar 2022 • Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of an agent's capabilities, resulting in curricula that start simple but become increasingly complex.
no code implementations • 29 Jan 2022 • Mingwei Ma, Jizhou Liu, Samuel Sokota, Max Kleiman-Weiner, Jakob Foerster
An unaddressed challenge in multi-agent coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations.
1 code implementation • 7 Jan 2022 • Jakub Grudzien Kuba, Christian Schroeder de Witt, Jakob Foerster
In contrast, in this paper we introduce a novel theoretical framework, named Mirror Learning, which provides theoretical guarantees to a large class of algorithms, including TRPO and PPO.
no code implementations • 24 Dec 2021 • Jonathan Lorraine, Paul Vicol, Jack Parker-Holder, Tal Kachman, Luke Metz, Jakob Foerster
We generalize this idea to non-conservative, multi-agent gradient systems by proposing a method - denoted Generalized Ridge Rider (GRR) - for finding arbitrary bifurcation points.
no code implementations • NeurIPS 2021 • Aldo Pacchiano, Shaun Singh, Edward Chou, Alexander C. Berg, Jakob Foerster
The lender only observes whether a customer will repay a loan if the loan is issued to begin with, and thus modeled decisions affect what data is available to the lender for future decisions.
4 code implementations • NeurIPS 2021 • Minqi Jiang, Michael Dennis, Jack Parker-Holder, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel
Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria.
no code implementations • 26 Jul 2021 • Danielle Rothermel, Margaret Li, Tim Rocktäschel, Jakob Foerster
After carefully redesigning the empirical setup, we find that when tuning learning rates properly, pretrained transformers do outperform or match training from scratch in all of our tasks, but only as long as the entire model is finetuned.
1 code implementation • 17 Jul 2021 • Samuel Sokota, Christian Schroeder de Witt, Maximilian Igl, Luisa Zintgraf, Philip Torr, Martin Strohmeier, J. Zico Kolter, Shimon Whiteson, Jakob Foerster
We contribute a theoretically grounded approach to MCGs based on maximum entropy reinforcement learning and minimum entropy coupling that we call MEME.
1 code implementation • 14 Jul 2021 • Qizhen Zhang, Chris Lu, Animesh Garg, Jakob Foerster
We also learn a centralized exploration policy within our model that learns to collect additional data in state-action regions with high model uncertainty.
no code implementations • 16 Jun 2021 • Hengyuan Hu, Adam Lerer, Noam Brown, Jakob Foerster
Search is an important tool for computing effective policies in single- and multi-agent environments, and has been crucial for achieving superhuman performance in several benchmark fully and partially observable games.
1 code implementation • 11 Jun 2021 • Johannes Treutlein, Michael Dennis, Caspar Oesterheld, Jakob Foerster
We introduce an extension of the algorithm, other-play with tie-breaking, and prove that it is optimal in the LFC problem and an equilibrium in the LFC game.
no code implementations • 14 Mar 2021 • Kalesha Bullard, Douwe Kiela, Franziska Meier, Joelle Pineau, Jakob Foerster
In contrast, in this work, we present a novel problem setting and the Quasi-Equivalence Discovery (QED) algorithm that allows for zero-shot coordination (ZSC), i. e., discovering protocols that can generalize to independently trained agents.
5 code implementations • 6 Mar 2021 • Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster
Policies learned through self-play may adopt arbitrary conventions and implicitly rely on multi-step reasoning based on fragile assumptions about other agents' actions and thus fail when paired with humans or independently trained agents at test time.
no code implementations • NeurIPS 2020 • Jack Parker-Holder, Luke Metz, Cinjon Resnick, Hengyuan Hu, Adam Lerer, Alistair Letcher, Alex Peysakhovich, Aldo Pacchiano, Jakob Foerster
In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs).
no code implementations • 29 Oct 2020 • Kalesha Bullard, Franziska Meier, Douwe Kiela, Joelle Pineau, Jakob Foerster
Indeed, emergent communication is now a vibrant field of research, with common settings involving discrete cheap-talk channels.
1 code implementation • 23 Sep 2020 • Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom
For neural models to garner widespread public trust and ensure fairness, we must have human-intelligible explanations for their predictions.
no code implementations • WS 2020 • Abhinav Gupta, Cinjon Resnick, Jakob Foerster, Andrew Dai, Kyunghyun Cho
Our hypothesis is that there should be a specific range of model capacity and channel bandwidth that induces compositional structure in the resulting language and consequently encourages systematic generalization.
1 code implementation • 19 Mar 2020 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
At the same time, it is often possible to train the agents in a centralised fashion where global state information is available and communication constraints are lifted.
Ranked #6 on
SMAC
on SMAC 6h_vs_8z
2 code implementations • 6 Mar 2020 • Hengyuan Hu, Adam Lerer, Alex Peysakhovich, Jakob Foerster
We consider the problem of zero-shot coordination - constructing AI agents that can coordinate with novel partners they have not seen before (e. g. humans).
1 code implementation • ICLR 2020 • Ryan Lowe, Abhinav Gupta, Jakob Foerster, Douwe Kiela, Joelle Pineau
A promising approach for teaching artificial agents to use natural language involves using human-in-the-loop training.
10 code implementations • 5 Dec 2019 • Adam Lerer, Hengyuan Hu, Jakob Foerster, Noam Brown
The first one, single-agent search, effectively converts the problem into a single agent setting by making all but one of the agents play according to the agreed-upon policy.
1 code implementation • NeurIPS 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.
no code implementations • WS 2019 • Abhinav Gupta, Ryan Lowe, Jakob Foerster, Douwe Kiela, Joelle Pineau
Once the meta-learning agent is able to quickly adapt to each population of agents, it can be deployed in new populations, including populations speaking human language.
1 code implementation • 24 Oct 2019 • Cinjon Resnick, Abhinav Gupta, Jakob Foerster, Andrew M. Dai, Kyunghyun Cho
In this paper, we investigate the learning biases that affect the efficacy and compositionality of emergent languages.
2 code implementations • 4 Oct 2019 • Oana-Maria Camburu, Eleonora Giunchiglia, Jakob Foerster, Thomas Lukasiewicz, Phil Blunsom
We aim for this framework to provide a publicly available, off-the-shelf evaluation when the feature-selection perspective on explanations is needed.
no code implementations • 25 Sep 2019 • Christoph Aymanns, Matthias Weber, Co-Pierre Georg, Jakob Foerster
We incorporate fake news into the model by adding an adversarial agent, the attacker, that either provides biased private signals to or takes over a subset of agents.
1 code implementation • 23 Sep 2019 • Gregory Farquhar, Shimon Whiteson, Jakob Foerster
Gradient-based methods for optimisation of objectives in stochastic settings with unknown or intractable dynamics require estimators of derivatives.
no code implementations • 10 Jun 2019 • Jelena Luketina, Nantas Nardelli, Gregory Farquhar, Jakob Foerster, Jacob Andreas, Edward Grefenstette, Shimon Whiteson, Tim Rocktäschel
To be successful in real-world tasks, Reinforcement Learning (RL) needs to exploit the compositional, relational, and hierarchical structure of the world, and learn to transfer it to the task at hand.
1 code implementation • 13 May 2019 • Alistair Letcher, David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel
The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games.
1 code implementation • 12 Mar 2019 • Ryan Lowe, Jakob Foerster, Y-Lan Boureau, Joelle Pineau, Yann Dauphin
How do we know if communication is emerging in a multi-agent system?
22 code implementations • 11 Feb 2019 • Mikayel Samvelyan, Tabish Rashid, Christian Schroeder de Witt, Gregory Farquhar, Nantas Nardelli, Tim G. J. Rudner, Chia-Man Hung, Philip H. S. Torr, Jakob Foerster, Shimon Whiteson
In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap.
Ranked #6 on
SMAC
on SMAC 6h_vs_8z
no code implementations • ICLR 2019 • Alistair Letcher, Jakob Foerster, David Balduzzi, Tim Rocktäschel, Shimon Whiteson
A growing number of learning methods are actually differentiable games whose players optimise multiple, interdependent objectives in parallel -- from GANs and intrinsic curiosity to multi-agent RL.
no code implementations • 27 Sep 2018 • Jingkai Mao, Jakob Foerster, Tim Rocktäschel, Gregory Farquhar, Maruan Al-Shedivat, Shimon Whiteson
To improve the sample efficiency of DiCE, we propose a new baseline term for higher order gradient estimation.
1 code implementation • ICML 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric Xing, Shimon Whiteson
Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.
17 code implementations • ICML 2018 • Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted.
Ranked #1 on
SMAC+
on Off_Near_parallel
Multi-agent Reinforcement Learning
reinforcement-learning
+5
1 code implementation • ICML 2018 • David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel
The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.
5 code implementations • 14 Feb 2018 • Jakob Foerster, Gregory Farquhar, Maruan Al-Shedivat, Tim Rocktäschel, Eric P. Xing, Shimon Whiteson
Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for estimators of higher-order derivatives.
no code implementations • 21 Aug 2017 • Christoph Aymanns, Jakob Foerster, Co-Pierre Georg
We model the spread of news as a social learning game on a network.
6 code implementations • 24 May 2017 • Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, Shimon Whiteson
COMA uses a centralised critic to estimate the Q-function and decentralised actors to optimise the agents' policies.
Ranked #1 on
SMAC+
on Off_Superhard_parallel
5 code implementations • ICML 2017 • Jakob Foerster, Nantas Nardelli, Gregory Farquhar, Triantafyllos Afouras, Philip H. S. Torr, Pushmeet Kohli, Shimon Whiteson
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems.