no code implementations • 8 Mar 2021 • Kevin R. McKee, Edward Hughes, Tina O. Zhu, Martin J. Chadwick, Raphael Koster, Antonio Garcia Castaneda, Charlie Beattie, Thore Graepel, Matt Botvinick, Joel Z. Leibo
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate.
Such systems have local incentives for individuals, whose behavior has an impact on the global outcome for the group.
We also show that higher-order belief models outperform agents with lower-order models.
We see opportunity to more explicitly focus on the problem of cooperation, to construct unified theory and vocabulary, and to build bridges with adjacent communities working on cooperation, including in the natural, social, and behavioural sciences.
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals.
The challenge of developing powerful and general Reinforcement Learning (RL) agents has received increasing attention in recent years.
Here we argue that a systematic study of many-player zero-sum games is a crucial element of artificial intelligence research.
Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity.
With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact.
no code implementations • • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).
13 code implementations • 26 Aug 2019 • Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.
We analyse the resulting policies to show that the reciprocating agents are strongly influenced by their co-players' behavior.
Evolution has produced a multi-scale mosaic of interacting adaptive units.
1 code implementation • 1 Feb 2019 • Nolan Bard, Jakob N. Foerster, Sarath Chandar, Neil Burch, Marc Lanctot, H. Francis Song, Emilio Parisotto, Vincent Dumoulin, Subhodeep Moitra, Edward Hughes, Iain Dunning, Shibl Mourad, Hugo Larochelle, Marc G. Bellemare, Michael Bowling
From the early days of computing, games have been important testbeds for studying how well machines can do sophisticated decision making.
Discovering and exploiting the causal structure in the environment is a crucial challenge for intelligent agents.
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation.
Multi-agent cooperation is an important feature of the natural world.
We present the Bayesian action decoder (BAD), a new multi-agent learning method that uses an approximate Bayesian update to obtain a public belief that conditions on the actions taken by all agents in the environment.
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.
Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.
2 code implementations • • Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas.