no code implementations • ICML 2020 • Remi Munos, Julien Perolat, Jean-Baptiste Lespiau, Mark Rowland, Bart De Vylder, Marc Lanctot, Finbarr Timbers, Daniel Hennes, Shayegan Omidshafiei, Audrunas Gruslys, Mohammad Gheshlaghi Azar, Edward Lockhart, Karl Tuyls
We introduce and analyze a class of algorithms, called Mirror Ascent against an Improved Opponent (MAIO), for computing Nash equilibria in two-player zero-sum games, both in normal form and in sequential imperfect information form.
no code implementations • 1 May 2023 • Udari Madhushani, Kevin R. McKee, John P. Agapiou, Joel Z. Leibo, Richard Everett, Thomas Anthony, Edward Hughes, Karl Tuyls, Edgar A. Duéñez-Guzmán
In social psychology, Social Value Orientation (SVO) describes an individual's propensity to allocate resources between themself and others.
no code implementations • 18 Jan 2023 • Adaptive Agent Team, Jakob Bauer, Kate Baumli, Satinder Baveja, Feryal Behbahani, Avishkar Bhoopchand, Nathalie Bradley-Schmieg, Michael Chang, Natalie Clay, Adrian Collister, Vibhavari Dasagi, Lucy Gonzalez, Karol Gregor, Edward Hughes, Sheleem Kashem, Maria Loks-Thompson, Hannah Openshaw, Jack Parker-Holder, Shreya Pathak, Nicolas Perez-Nieves, Nemanja Rakicevic, Tim Rocktäschel, Yannick Schroecker, Jakub Sygnowski, Karl Tuyls, Sarah York, Alexander Zacherl, Lei Zhang
Foundation models have shown impressive adaptation and scalability in supervised and self-supervised learning problems, but so far these successes have not fully translated to reinforcement learning (RL).
no code implementations • 11 Jan 2023 • Mark Rowland, Rémi Munos, Mohammad Gheshlaghi Azar, Yunhao Tang, Georg Ostrovski, Anna Harutyunyan, Karl Tuyls, Marc G. Bellemare, Will Dabney
We analyse quantile temporal-difference learning (QTD), a distributional reinforcement learning algorithm that has proven to be a key component in several successful large-scale applications of reinforcement learning.
Distributional Reinforcement Learning
reinforcement-learning
+1
no code implementations • 17 Oct 2022 • Luke Marris, Ian Gemp, Thomas Anthony, Andrea Tacchetti, SiQi Liu, Karl Tuyls
We argue that such a network is a powerful component for many possible multiagent algorithms.
no code implementations • 5 Oct 2022 • Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel
Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.
no code implementations • 22 Sep 2022 • Ian Gemp, Thomas Anthony, Yoram Bachrach, Avishkar Bhoopchand, Kalesha Bullard, Jerome Connor, Vibhavari Dasagi, Bart De Vylder, Edgar Duenez-Guzman, Romuald Elie, Richard Everett, Daniel Hennes, Edward Hughes, Mina Khan, Marc Lanctot, Kate Larson, Guy Lever, SiQi Liu, Luke Marris, Kevin R. McKee, Paul Muller, Julien Perolat, Florian Strub, Andrea Tacchetti, Eugene Tarassov, Zhe Wang, Karl Tuyls
The Game Theory & Multi-Agent team at DeepMind studies several aspects of multi-agent learning ranging from computing approximations to fundamental concepts in game theory to simulating social dilemmas in rich spatial environments and training 3-d humanoids in difficult team coordination tasks.
no code implementations • 22 Aug 2022 • Paul Muller, Romuald Elie, Mark Rowland, Mathieu Lauriere, Julien Perolat, Sarah Perrin, Matthieu Geist, Georgios Piliouras, Olivier Pietquin, Karl Tuyls
The designs of many large-scale systems today, from traffic routing environments to smart grids, rely on game-theoretic equilibrium concepts.
1 code implementation • 30 Jun 2022 • Julien Perolat, Bart De Vylder, Daniel Hennes, Eugene Tarassov, Florian Strub, Vincent de Boer, Paul Muller, Jerome T. Connor, Neil Burch, Thomas Anthony, Stephen Mcaleer, Romuald Elie, Sarah H. Cen, Zhe Wang, Audrunas Gruslys, Aleksandra Malysheva, Mina Khan, Sherjil Ozair, Finbarr Timbers, Toby Pohlen, Tom Eccles, Mark Rowland, Marc Lanctot, Jean-Baptiste Lespiau, Bilal Piot, Shayegan Omidshafiei, Edward Lockhart, Laurent SIfre, Nathalie Beauguerlange, Remi Munos, David Silver, Satinder Singh, Demis Hassabis, Karl Tuyls
It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes).
no code implementations • 21 Oct 2021 • Edgar A. Duéñez-Guzmán, Kevin R. McKee, Yiran Mao, Ben Coppin, Silvia Chiappa, Alexander Sasha Vezhnevets, Michiel A. Bakker, Yoram Bachrach, Suzanne Sadedin, William Isaac, Karl Tuyls, Joel Z. Leibo
Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics.
no code implementations • 28 Jun 2021 • Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie, Daniel Hennes, Jerome Connor, Karl Tuyls
Importantly, $\Phi$-regret enables learning agents to consider deviations from and to mixed strategies, generalizing several existing notions of regret such as external, internal, and swap regret, and thus broadening the insights gained from regret-based analysis of learning algorithms.
1 code implementation • 17 Jun 2021 • Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel
Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting.
no code implementations • 8 Jun 2021 • Shayegan Omidshafiei, Daniel Hennes, Marta Garnelo, Eugene Tarassov, Zhe Wang, Romuald Elie, Jerome T. Connor, Paul Muller, Ian Graham, William Spearman, Karl Tuyls
In multiagent environments, several decision-making individuals interact while adhering to the dynamics constraints imposed by the environment.
1 code implementation • 25 May 2021 • SiQi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess
In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds.
1 code implementation • 28 Feb 2021 • Julien Perolat, Sarah Perrin, Romuald Elie, Mathieu Laurière, Georgios Piliouras, Matthieu Geist, Karl Tuyls, Olivier Pietquin
We address scaling up equilibrium computation in Mean Field Games (MFGs) using Online Mirror Descent (OMD).
1 code implementation • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.
no code implementations • 27 Aug 2020 • Audrūnas Gruslys, Marc Lanctot, Rémi Munos, Finbarr Timbers, Martin Schmid, Julien Perolat, Dustin Morrill, Vinicius Zambaldi, Jean-Baptiste Lespiau, John Schultz, Mohammad Gheshlaghi Azar, Michael Bowling, Karl Tuyls
In this paper, we describe a general model-free RL method for no-regret learning based on repeated reconsideration of past behavior.
no code implementations • 4 May 2020 • Shayegan Omidshafiei, Karl Tuyls, Wojciech M. Czarnecki, Francisco C. Santos, Mark Rowland, Jerome Connor, Daniel Hennes, Paul Muller, Julien Perolat, Bart De Vylder, Audrunas Gruslys, Remi Munos
Multiplayer games have long been used as testbeds in artificial intelligence research, aptly referred to as the Drosophila of artificial intelligence.
1 code implementation • NeurIPS 2020 • Wojciech Marian Czarnecki, Gauthier Gidel, Brendan Tracey, Karl Tuyls, Shayegan Omidshafiei, David Balduzzi, Max Jaderberg
This paper investigates the geometrical properties of real world games (e. g. Tic-Tac-Toe, Go, StarCraft II).
no code implementations • 21 Feb 2020 • Gregory Palmer, Benjamin Schnieders, Rahul Savani, Karl Tuyls, Joscha-David Fossel, Harry Flore
We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies.
no code implementations • 19 Feb 2020 • Julien Perolat, Remi Munos, Jean-Baptiste Lespiau, Shayegan Omidshafiei, Mark Rowland, Pedro Ortega, Neil Burch, Thomas Anthony, David Balduzzi, Bart De Vylder, Georgios Piliouras, Marc Lanctot, Karl Tuyls
In this paper we investigate the Follow the Regularized Leader dynamics in sequential imperfect information games (IIG).
1 code implementation • ICLR 2020 • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).
1 code implementation • NeurIPS 2019 • Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos
This paper investigates the evaluation of learned multiagent strategies in the incomplete information setting, which plays a critical role in ranking and training of agents.
15 code implementations • 26 Aug 2019 • Marc Lanctot, Edward Lockhart, Jean-Baptiste Lespiau, Vinicius Zambaldi, Satyaki Upadhyay, Julien Pérolat, Sriram Srinivasan, Finbarr Timbers, Karl Tuyls, Shayegan Omidshafiei, Daniel Hennes, Dustin Morrill, Paul Muller, Timo Ewalds, Ryan Faulkner, János Kramár, Bart De Vylder, Brennan Saeta, James Bradbury, David Ding, Sebastian Borgeaud, Matthew Lai, Julian Schrittwieser, Thomas Anthony, Edward Hughes, Ivo Danihelka, Jonah Ryan-Davis
OpenSpiel is a collection of environments and algorithms for research in general reinforcement learning and search/planning in games.
1 code implementation • 1 Jun 2019 • Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, Remi Munos, Julien Perolat, Marc Lanctot, Audrunas Gruslys, Jean-Baptiste Lespiau, Paavo Parmas, Edgar Duenez-Guzman, Karl Tuyls
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning.
1 code implementation • 13 May 2019 • Alistair Letcher, David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel
The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games.
no code implementations • ICLR 2019 • Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia
We introduce an approach for augmenting model-free deep reinforcement learning agents with a mechanism for relational reasoning over structured representations, which improves performance, learning efficiency, generalization, and interpretability.
no code implementations • 12 Apr 2019 • James Butterworth, Rahul Savani, Karl Tuyls
Simultaneous Localisation and Mapping (SLAM) algorithms are expensive to run on smaller robotic platforms such as Micro-Aerial Vehicles.
no code implementations • 13 Mar 2019 • Edward Lockhart, Marc Lanctot, Julien Pérolat, Jean-Baptiste Lespiau, Dustin Morrill, Finbarr Timbers, Karl Tuyls
In this paper, we present exploitability descent, a new algorithm to compute approximate equilibria in two-player zero-sum extensive-form games with imperfect information, by direct policy optimization against worst-case opponents.
1 code implementation • 4 Mar 2019 • Shayegan Omidshafiei, Christos Papadimitriou, Georgios Piliouras, Karl Tuyls, Mark Rowland, Jean-Baptiste Lespiau, Wojciech M. Czarnecki, Marc Lanctot, Julien Perolat, Remi Munos
We introduce {\alpha}-Rank, a principled evolutionary dynamics methodology, for the evaluation and ranking of agents in large-scale multi-agent interactions, grounded in a novel dynamical game-theoretic solution concept called Markov-Conley chains (MCCs).
no code implementations • 23 Jan 2019 • Richard Klima, Daan Bloembergen, Michael Kaisers, Karl Tuyls
We prove convergence of the operator to the optimal robust Q-function with respect to the model using the theory of Generalized Markov Decision Processes.
1 code implementation • NeurIPS 2018 • Sriram Srinivasan, Marc Lanctot, Vinicius Zambaldi, Julien Perolat, Karl Tuyls, Remi Munos, Michael Bowling
Optimization of parameterized policies for reinforcement learning (RL) is an important and challenging problem in artificial intelligence.
no code implementations • 18 Sep 2018 • Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Zhiyong Feng, Wanli Xue, Rong Chen
Although many reinforcement learning methods have been proposed for learning the optimal solutions in single-agent continuous-action domains, multiagent coordination domains with continuous actions have received relatively few investigations.
1 code implementation • 13 Sep 2018 • Gregory Palmer, Rahul Savani, Karl Tuyls
For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards.
no code implementations • 13 Aug 2018 • Benjamin Schnieders, Karl Tuyls
Compared to state-of-the-art task weighting methods, the improvement is 24. 5% in convergence, and 15. 8% on the estimated pickup rate.
2 code implementations • NeurIPS 2018 • David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel
Progress in machine learning is measured by careful evaluation on problems of outstanding common interest.
7 code implementations • 5 Jun 2018 • Vinicius Zambaldi, David Raposo, Adam Santoro, Victor Bapst, Yujia Li, Igor Babuschkin, Karl Tuyls, David Reichert, Timothy Lillicrap, Edward Lockhart, Murray Shanahan, Victoria Langston, Razvan Pascanu, Matthew Botvinick, Oriol Vinyals, Peter Battaglia
We introduce an approach for deep reinforcement learning (RL) that improves upon the efficiency, generalization capacity, and interpretability of conventional approaches through structured perception and relational reasoning.
1 code implementation • ICLR 2018 • Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z. Leibo, Karl Tuyls, Stephen Clark
We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.
1 code implementation • ICLR 2018 • Angeliki Lazaridou, Karl Moritz Hermann, Karl Tuyls, Stephen Clark
The ability of algorithms to evolve or learn (compositional) communication protocols has traditionally been studied in the language evolution literature through the use of emergent communication tasks.
3 code implementations • NeurIPS 2018 • Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas.
no code implementations • 8 Mar 2018 • Chengwei Zhang, Xiaohong Li, Jianye Hao, Siqi Chen, Karl Tuyls, Wanli Xue
In multiagent environments, the capability of learning is important for an agent to behave appropriately in face of unknown opponents and dynamic environment.
1 code implementation • ICML 2018 • David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel
The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.
1 code implementation • NeurIPS 2017 • Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL).
4 code implementations • NeurIPS 2017 • Julien Perolat, Joel Z. Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, Thore Graepel
Here we show that deep reinforcement learning can be used instead.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 14 Jul 2017 • Gregory Palmer, Karl Tuyls, Daan Bloembergen, Rahul Savani
We find that LDQN agents are more likely to converge to the optimal policy in a stochastic reward CMOTP compared to standard and scheduled-HDQN agents.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
7 code implementations • 16 Jun 2017 • Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel
We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal.
Ranked #1 on
SMAC+
on Off_Superhard_parallel
Multi-agent Reinforcement Learning
reinforcement-learning
+2