no code implementations • 13 Feb 2024 • Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro
The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size.
no code implementations • 8 Feb 2024 • Kitty Fung, Qizhen Zhang, Chris Lu, Jia Wan, Timon Willi, Jakob Foerster
Providing theoretical guarantees for M-FOS is hard because A) there is little literature on theoretical sample complexity bounds for meta-reinforcement learning B) M-FOS operates in continuous state and action spaces, so theoretical analysis is challenging.
no code implementations • 19 Dec 2023 • Akbir Khan, Timon Willi, Newton Kwan, Andrea Tacchetti, Chris Lu, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes.
no code implementations • 19 Dec 2023 • Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel
We evaluate on over 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning.
2 code implementations • 16 Nov 2023 • Alexander Rutherford, Benjamin Ellis, Matteo Gallici, Jonathan Cook, Andrei Lupu, Gardar Ingvarsson, Timon Willi, Akbir Khan, Christian Schroeder de Witt, Alexandra Souly, Saptarashmi Bandyopadhyay, Mikayel Samvelyan, Minqi Jiang, Robert Tjarko Lange, Shimon Whiteson, Bruno Lacerda, Nick Hawes, Tim Rocktaschel, Chris Lu, Jakob Nicolaus Foerster
This not only enables GPU acceleration, but also provides a more flexible MARL environment, unlocking the potential for self-play, meta-learning, and other future applications in MARL.
1 code implementation • 20 Nov 2022 • Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster
More specifically, we show that an ACT Adversary is capable of harming performance by interfering with the learner's function approximation, or instead helping the Victim's performance by outputting useful features.
2 code implementations • 3 May 2022 • Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster
In general-sum games, the interaction of self-interested learning agents commonly leads to collectively worst-case outcomes, such as defect-defect in the iterated prisoner's dilemma (IPD).
1 code implementation • 8 Mar 2022 • Timon Willi, Alistair Letcher, Johannes Treutlein, Jakob Foerster
Finally, in an empirical evaluation on a set of general-sum games, we find that COLA finds prosocial solutions and that it converges under a wider range of learning rates than HOLA and LOLA.
2 code implementations • 13 Jun 2019 • Timon Willi, Jonathan Masci, Jürgen Schmidhuber, Christian Osendorfer
We extend Neural Processes (NPs) to sequential data through Recurrent NPs or RNPs, a family of conditional state space models.