1 code implementation • 20 Aug 2024 • Yan Wu, Esther Wershof, Sebastian M Schmon, Marcel Nassar, Błażej Osiński, Ridvan Eksi, Kun Zhang, Thore Graepel
We present a comprehensive framework for predicting the effects of perturbations in single cells, designed to standardize benchmarking in this rapidly evolving field.
no code implementations • 5 Oct 2022 • Luke Marris, Marc Lanctot, Ian Gemp, Shayegan Omidshafiei, Stephen Mcaleer, Jerome Connor, Karl Tuyls, Thore Graepel
Rating strategies in a game is an important area of research in game theory and artificial intelligence, and can be applied to any real-world competitive or cooperative setting.
no code implementations • ICLR 2022 • SiQi Liu, Luke Marris, Daniel Hennes, Josh Merel, Nicolas Heess, Thore Graepel
Learning in strategy games (e. g. StarCraft, poker) requires the discovery of diverse policies.
no code implementations • 5 Jan 2022 • Kavya Kopparapu, Edgar A. Duéñez-Guzmán, Jayd Matyas, Alexander Sasha Vezhnevets, John P. Agapiou, Kevin R. McKee, Richard Everett, Janusz Marecki, Joel Z. Leibo, Thore Graepel
A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate.
no code implementations • 28 Sep 2021 • Thore Graepel, Ralf Herbrich
Abstract We present PAC-Bayesian bounds for the generalisation error of the K-nearest-neighbour classifier (K-NN).
no code implementations • 14 Jul 2021 • Joel Z. Leibo, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel
Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks).
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • 17 Jun 2021 • Luke Marris, Paul Muller, Marc Lanctot, Karl Tuyls, Thore Graepel
Two-player, constant-sum games are well studied in the literature, but there has been limited progress outside of this setting.
1 code implementation • 25 May 2021 • SiQi Liu, Guy Lever, Zhe Wang, Josh Merel, S. M. Ali Eslami, Daniel Hennes, Wojciech M. Czarnecki, Yuval Tassa, Shayegan Omidshafiei, Abbas Abdolmaleki, Noah Y. Siegel, Leonard Hasenclever, Luke Marris, Saran Tunyasuvunakool, H. Francis Song, Markus Wulfmeier, Paul Muller, Tuomas Haarnoja, Brendan D. Tracey, Karl Tuyls, Thore Graepel, Nicolas Heess
In a sequence of stages, players first learn to control a fully articulated body to perform realistic, human-like movements such as running and turning; they then acquire mid-level football skills such as dribbling and shooting; finally, they develop awareness of others and play as a team, bridging the gap between low-level motor control at a timescale of milliseconds, and coordinated goal-directed behaviour as a team at the timescale of tens of seconds.
no code implementations • 8 Mar 2021 • Kevin R. McKee, Edward Hughes, Tina O. Zhu, Martin J. Chadwick, Raphael Koster, Antonio Garcia Castaneda, Charlie Beattie, Thore Graepel, Matt Botvinick, Joel Z. Leibo
Collective action demands that individuals efficiently coordinate how much, where, and when to cooperate.
Multi-agent Reinforcement Learning reinforcement-learning +2
1 code implementation • ICLR 2022 • Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel
We build on the recently proposed EigenGame that views eigendecomposition as a competitive game.
no code implementations • 15 Dec 2020 • Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel
We see opportunity to more explicitly focus on the problem of cooperation, to construct unified theory and vocabulary, and to build bridges with adjacent communities working on cooperation, including in the natural, social, and behavioural sciences.
1 code implementation • 18 Nov 2020 • Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adria Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Perolat, Bart De Vylder, Ali Eslami, Mark Rowland, Andrew Jaegle, Remi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis
The rapid progress in artificial intelligence (AI) and machine learning has opened unprecedented analytics possibilities in various team and individual sports, including baseball, basketball, and tennis.
no code implementations • ICLR 2019 • Yoram Bachrach, Richard Everett, Edward Hughes, Angeliki Lazaridou, Joel Z. Leibo, Marc Lanctot, Michael Johanson, Wojciech M. Czarnecki, Thore Graepel
When autonomous agents interact in the same environment, they must often cooperate to achieve their goals.
2 code implementations • ICLR 2021 • Ian Gemp, Brian McWilliams, Claire Vernade, Thore Graepel
We present a novel view on principal component analysis (PCA) as a competitive game in which each approximate eigenvector is controlled by a player whose goal is to maximize their own utility function.
2 code implementations • NeurIPS 2020 • Thomas Anthony, Tom Eccles, Andrea Tacchetti, János Kramár, Ian Gemp, Thomas C. Hudson, Nicolas Porcel, Marc Lanctot, Julien Pérolat, Richard Everett, Roman Werpachowski, Satinder Singh, Thore Graepel, Yoram Bachrach
It also features a large combinatorial action space and simultaneous moves, which are challenging for RL algorithms.
no code implementations • ICLR 2020 • David Balduzzi, Wojciech M. Czarnecki, Thomas W. Anthony, Ian M Gemp, Edward Hughes, Joel Z. Leibo, Georgios Piliouras, Thore Graepel
With the success of modern machine learning, it is becoming increasingly important to understand and control how learning algorithms interact.
no code implementations • NeurIPS 2019 • Tom Eccles, Yoram Bachrach, Guy Lever, Angeliki Lazaridou, Thore Graepel
We study the problem of emergent communication, in which language arises because speakers and listeners must communicate information in order to solve tasks.
Multi-agent Reinforcement Learning reinforcement-learning +2
18 code implementations • 19 Nov 2019 • Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent SIfre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel, Timothy Lillicrap, David Silver
When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.
Ranked #1 on Atari Games on Atari 2600 Boxing
1 code implementation • ICLR 2020 • Paul Muller, Shayegan Omidshafiei, Mark Rowland, Karl Tuyls, Julien Perolat, Si-Qi Liu, Daniel Hennes, Luke Marris, Marc Lanctot, Edward Hughes, Zhe Wang, Guy Lever, Nicolas Heess, Thore Graepel, Remi Munos
This paper investigates a population-based training regime based on game-theoretic principles called Policy-Spaced Response Oracles (PSRO).
no code implementations • 25 Sep 2019 • Yoram Bachrach, Tor Lattimore, Marta Garnelo, Julien Perolat, David Balduzzi, Thomas Anthony, Satinder Singh, Thore Graepel
We show that MARL converges to the desired outcome if the rewards are designed so that exerting effort is the iterated dominance solution, but fails if it is merely a Nash equilibrium.
no code implementations • 11 Jul 2019 • Andrea Tacchetti, DJ Strouse, Marta Garnelo, Thore Graepel, Yoram Bachrach
From social networks to supply chains, more and more aspects of how humans, firms and organizations interact is mediated by artificial learning agents.
1 code implementation • 13 May 2019 • Alistair Letcher, David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel
The decomposition motivates Symplectic Gradient Adjustment (SGA), a new algorithm for finding stable fixed points in differentiable games.
no code implementations • 2 Mar 2019 • Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel
Evolution has produced a multi-scale mosaic of interacting adaptive units.
no code implementations • ICLR 2019 • Si-Qi Liu, Guy Lever, Josh Merel, Saran Tunyasuvunakool, Nicolas Heess, Thore Graepel
We study the emergence of cooperative behaviors in reinforcement learning agents by introducing a challenging competitive multi-agent soccer environment with continuous simulated physics.
no code implementations • 23 Jan 2019 • David Balduzzi, Marta Garnelo, Yoram Bachrach, Wojciech M. Czarnecki, Julien Perolat, Max Jaderberg, Thore Graepel
Zero-sum games such as chess and poker are, abstractly, functions that evaluate pairs of agents, for example labeling them `winner' and `loser'.
no code implementations • 17 Dec 2018 • Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel
Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • ICLR 2019 • Andrea Tacchetti, H. Francis Song, Pedro A. M. Mediano, Vinicius Zambaldi, Neil C. Rabinowitz, Thore Graepel, Matthew Botvinick, Peter W. Battaglia
The behavioral dynamics of multi-agent systems have a rich and orderly structure, which can be leveraged to understand these systems, and to improve how artificial agents learn to operate in them.
no code implementations • 3 Jul 2018 • Max Jaderberg, Wojciech M. Czarnecki, Iain Dunning, Luke Marris, Guy Lever, Antonio Garcia Castaneda, Charles Beattie, Neil C. Rabinowitz, Ari S. Morcos, Avraham Ruderman, Nicolas Sonnerat, Tim Green, Louise Deason, Joel Z. Leibo, David Silver, Demis Hassabis, Koray Kavukcuoglu, Thore Graepel
Recent progress in artificial intelligence through reinforcement learning (RL) has shown great success on increasingly complex single-agent environments and two-player turn-based games.
5 code implementations • 11 Jun 2018 • Tobias Baumann, Thore Graepel, John Shawe-Taylor
In the future, artificial learning agents are likely to become increasingly widespread in our society.
2 code implementations • NeurIPS 2018 • David Balduzzi, Karl Tuyls, Julien Perolat, Thore Graepel
Progress in machine learning is measured by careful evaluation on problems of outstanding common interest.
3 code implementations • NeurIPS 2018 • Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel
Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas.
1 code implementation • ICML 2018 • David Balduzzi, Sebastien Racaniere, James Martens, Jakob Foerster, Karl Tuyls, Thore Graepel
The first is related to potential games, which reduce to gradient descent on an implicit function; the second relates to Hamiltonian games, a new class of games that obey a conservation law, akin to conservation laws in classical mechanical systems.
60 code implementations • 5 Dec 2017 • David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent SIfre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis
The game of chess is the most widely-studied domain in the history of artificial intelligence.
Ranked #1 on Game of Go on ELO Ratings
2 code implementations • NIPS 2007 • Pierre Dangauthier, Ralf Herbrich, Tom Minka, Thore Graepel
We extend the Bayesian skill rating system TrueSkill to infer entire time series of skills of players by smoothing through time instead of filtering.
1 code implementation • NeurIPS 2017 • Marc Lanctot, Vinicius Zambaldi, Audrunas Gruslys, Angeliki Lazaridou, Karl Tuyls, Julien Perolat, David Silver, Thore Graepel
To achieve general intelligence, agents must learn how to interact with others in a shared environment: this is the challenge of multiagent reinforcement learning (MARL).
4 code implementations • NeurIPS 2017 • Julien Perolat, Joel Z. Leibo, Vinicius Zambaldi, Charles Beattie, Karl Tuyls, Thore Graepel
Here we show that deep reinforcement learning can be used instead.
Multi-agent Reinforcement Learning reinforcement-learning +2
9 code implementations • 16 Jun 2017 • Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinicius Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel
We study the problem of cooperative multi-agent reinforcement learning with a single joint reward signal.
Ranked #1 on SMAC+ on Off_Superhard_parallel
Multi-agent Reinforcement Learning reinforcement-learning +3
4 code implementations • 10 Feb 2017 • Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel
We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions.
Multi-agent Reinforcement Learning reinforcement-learning +2
no code implementations • 7 Mar 2016 • Diana Borsa, Thore Graepel, John Shawe-Taylor
We investigate a paradigm in multi-task reinforcement learning (MT-RL) in which an agent is placed in an environment and needs to learn to perform a series of tasks, within this space.
no code implementations • 9 Jun 2015 • Diana Borsa, Thore Graepel, Andrew Gordon
We consider the problem of modelling noisy but highly symmetric shapes that can be viewed as hierarchies of whole-part relationships in which higher level objects are composed of transformed collections of lower level objects.
no code implementations • 31 Oct 2014 • Aaron Defazio, Thore Graepel
Reinforcement learning agents have traditionally been evaluated on small toy problems.
1 code implementation • 19 Jul 2012 • Simon Lacoste-Julien, Konstantina Palla, Alex Davies, Gjergji Kasneci, Thore Graepel, Zoubin Ghahramani
The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information.