no code implementations • 26 Dec 2024 • Joel Z. Leibo, Alexander Sasha Vezhnevets, Manfred Diaz, John P. Agapiou, William A. Cunningham, Peter Sunehag, Julia Haas, Raphael Koster, Edgar A. Duéñez-Guzmán, William S. Isaac, Georgios Piliouras, Stanley M. Bileschi, Iyad Rahwan, Simon Osindero
Humans navigate a multi-scale mosaic of interlocking notions of what is appropriate for different situations.
no code implementations • 8 Dec 2023 • Yali Du, Joel Z. Leibo, Usman Islam, Richard Willis, Peter Sunehag
Cooperation in multi-agent learning (MAL) is a topic at the intersection of numerous disciplines, including game theory, economics, social sciences, and evolutionary biology.
no code implementations • 2 Feb 2023 • Peter Sunehag, Alexander Sasha Vezhnevets, Edgar Duéñez-Guzmán, Igor Mordach, Joel Z. Leibo
The algorithm we propose consists of two parts: an agent architecture and a learning rule.
3 code implementations • 24 Nov 2022 • John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo
Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios.
no code implementations • 14 Jul 2021 • Joel Z. Leibo, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks).
Multi-agent Reinforcement Learning reinforcement-learning +2
2 code implementations • NeurIPS 2020 • Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, Hongyuan Zha
The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years.
no code implementations • 17 Dec 2018 • Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation.
Multi-agent Reinforcement Learning reinforcement-learning +2
10 code implementations • 16 Jun 2017 • Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel
We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal.
Ranked #1 on SMAC+ on Off_Superhard_parallel
Multi-agent Reinforcement Learning reinforcement-learning +3
2 code implementations • 24 Dec 2015 • Gabriel Dulac-Arnold, Richard Evans, Hado van Hasselt, Peter Sunehag, Timothy Lillicrap, Jonathan Hunt, Timothy Mann, Theophane Weber, Thomas Degris, Ben Coppin
Being able to reason in an environment with a large number of discrete actions is essential to bringing reinforcement learning to a larger class of problems.
no code implementations • 3 Dec 2015 • Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, Ben Coppin
Further, we use deep deterministic policy gradients to learn a policy that for each position of the slate, guides attention towards the part of the action space in which the value is the highest and we only evaluate actions in this area.
no code implementations • 22 Aug 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag
We present a new algorithm for general reinforcement learning where the true environment is known to belong to a finite class of N arbitrary models.
no code implementations • 12 Jul 2013 • Hadi Mohasel Afshar, Peter Sunehag
For us, "(objective) background knowledge" is restricted to information that can be expressed as probability events.
no code implementations • 29 Jun 2013 • Tor Lattimore, Marcus Hutter, Peter Sunehag
We prove tight high-probability bounds on the cumulative error, which is measured in terms of the Kullback-Leibler (KL) divergence.