Search Results for author: Joel Z. Leibo

Found 57 papers, 22 papers with code

OPtions as REsponses: Grounding behavioural hierarchies in multi-agent reinforcement learning

no code implementations ICML 2020 Alexander Vezhnevets, Yuhuai Wu, Maria Eckstein, Rémi Leblond, Joel Z. Leibo

This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn't seen during training.

Multi-agent Reinforcement Learning reinforcement-learning +1

Neural Population Learning beyond Symmetric Zero-sum Games

no code implementations10 Jan 2024 SiQi Liu, Luke Marris, Marc Lanctot, Georgios Piliouras, Joel Z. Leibo, Nicolas Heess

We then introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated Equilibrium (CCE) of the game.

Transfer Learning

A Review of Cooperation in Multi-agent Learning

no code implementations8 Dec 2023 Yali Du, Joel Z. Leibo, Usman Islam, Richard Willis, Peter Sunehag

Cooperation in multi-agent learning (MAL) is a topic at the intersection of numerous disciplines, including game theory, economics, social sciences, and evolutionary biology.

Decision Making

Framework-Based Qualitative Analysis of Free Responses of Large Language Models: Algorithmic Fidelity

no code implementations6 Sep 2023 Aliya Amirova, Theodora Fteropoulli, Nafiso Ahmed, Martin R. Cowie, Joel Z. Leibo

Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative methods aiming to produce insights that could generalize to real human populations.

Melting Pot 2.0

2 code implementations24 Nov 2022 John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo

Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios.

Artificial Life Navigate

The emergence of division of labor through decentralized social sanctioning

no code implementations10 Aug 2022 Anil Yaman, Joel Z. Leibo, Giovanni Iacca, Sang Wan Lee

Here we show that by introducing a model of social norms, which we regard as emergent patterns of decentralized social sanctioning, it becomes possible for groups of self-interested individuals to learn a productive division of labor involving all critical roles.

Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria

no code implementations5 Jan 2022 Kavya Kopparapu, Edgar A. Duéñez-Guzmán, Jayd Matyas, Alexander Sasha Vezhnevets, John P. Agapiou, Kevin R. McKee, Richard Everett, Janusz Marecki, Joel Z. Leibo, Thore Graepel

A key challenge in the study of multiagent cooperation is the need for individual agents not only to cooperate effectively, but to decide with whom to cooperate.

Statistical discrimination in learning agents

no code implementations21 Oct 2021 Edgar A. Duéñez-Guzmán, Kevin R. McKee, Yiran Mao, Ben Coppin, Silvia Chiappa, Alexander Sasha Vezhnevets, Michiel A. Bakker, Yoram Bachrach, Suzanne Sadedin, William Isaac, Karl Tuyls, Joel Z. Leibo

Undesired bias afflicts both human and algorithmic decision making, and may be especially prevalent when information processing trade-offs incentivize the use of heuristics.

Decision Making Multi-agent Reinforcement Learning

Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot

no code implementations14 Jul 2021 Joel Z. Leibo, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel

Existing evaluation suites for multi-agent reinforcement learning (MARL) do not assess generalization to novel situations as their primary objective (unlike supervised-learning benchmarks).

Multi-agent Reinforcement Learning reinforcement-learning +1

Meta-control of social learning strategies

1 code implementation18 Jun 2021 Anil Yaman, Nicolas Bredeche, Onur Çaylak, Joel Z. Leibo, Sang Wan Lee

Based on these findings, we hypothesized that meta-control of individual and social learning strategies provides effective and sample-efficient learning in volatile and uncertain environments.

Open Problems in Cooperative AI

no code implementations15 Dec 2020 Allan Dafoe, Edward Hughes, Yoram Bachrach, Tantum Collins, Kevin R. McKee, Joel Z. Leibo, Kate Larson, Thore Graepel

We see opportunity to more explicitly focus on the problem of cooperation, to construct unified theory and vocabulary, and to build bridges with adjacent communities working on cooperation, including in the natural, social, and behavioural sciences.

Scheduling

DeepMind Lab2D

1 code implementation13 Nov 2020 Charles Beattie, Thomas Köppe, Edgar A. Duéñez-Guzmán, Joel Z. Leibo

We present DeepMind Lab2D, a scalable environment simulator for artificial intelligence research that facilitates researcher-led experimentation with environment design.

reinforcement-learning Reinforcement Learning (RL)

Social diversity and social preferences in mixed-motive reinforcement learning

no code implementations6 Feb 2020 Kevin R. McKee, Ian Gemp, Brian McWilliams, Edgar A. Duéñez-Guzmán, Edward Hughes, Joel Z. Leibo

Recent research on reinforcement learning in pure-conflict and pure-common interest games has emphasized the importance of population heterogeneity.

reinforcement-learning Reinforcement Learning (RL)

Generalization of Reinforcement Learners with Working and Episodic Memory

1 code implementation NeurIPS 2019 Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charlie Deck, Joel Z. Leibo, Charles Blundell

In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization.

Holdout Set

Options as responses: Grounding behavioural hierarchies in multi-agent RL

no code implementations4 Jun 2019 Alexander Sasha Vezhnevets, Yuhuai Wu, Remi Leblond, Joel Z. Leibo

This paper investigates generalisation in multi-agent games, where the generality of the agent can be evaluated by playing against opponents it hasn't seen during training.

Multi-agent Reinforcement Learning Reinforcement Learning (RL)

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

counterfactual Counterfactual Reasoning +2

Learning Reciprocity in Complex Sequential Social Dilemmas

no code implementations19 Mar 2019 Tom Eccles, Edward Hughes, János Kramár, Steven Wheelwright, Joel Z. Leibo

We analyse the resulting policies to show that the reciprocating agents are strongly influenced by their co-players' behavior.

Malthusian Reinforcement Learning

no code implementations17 Dec 2018 Joel Z. Leibo, Julien Perolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel

Here we explore a new algorithmic framework for multi-agent reinforcement learning, called Malthusian reinforcement learning, which extends self-play to include fitness-linked population size dynamics that drive ongoing innovation.

Multi-agent Reinforcement Learning reinforcement-learning +1

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

counterfactual Counterfactual Reasoning +3

Emergent Communication through Negotiation

1 code implementation ICLR 2018 Kris Cao, Angeliki Lazaridou, Marc Lanctot, Joel Z. Leibo, Karl Tuyls, Stephen Clark

We also study communication behaviour in a setting where one agent interacts with agents in a community with different levels of prosociality and show how agent identifiability can aid negotiation.

Multi-agent Reinforcement Learning

Kickstarting Deep Reinforcement Learning

no code implementations10 Mar 2018 Simon Schmitt, Jonathan J. Hudson, Augustin Zidek, Simon Osindero, Carl Doersch, Wojciech M. Czarnecki, Joel Z. Leibo, Heinrich Kuttler, Andrew Zisserman, Karen Simonyan, S. M. Ali Eslami

Our method places no constraints on the architecture of the teacher or student agents, and it regulates itself to allow the students to surpass their teachers in performance.

reinforcement-learning Reinforcement Learning (RL)

Deep Q-learning from Demonstrations

5 code implementations12 Apr 2017 Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized replay mechanism.

Imitation Learning Q-Learning +1

Multi-agent Reinforcement Learning in Sequential Social Dilemmas

4 code implementations10 Feb 2017 Joel Z. Leibo, Vinicius Zambaldi, Marc Lanctot, Janusz Marecki, Thore Graepel

We introduce sequential social dilemmas that share the mixed incentive structure of matrix game social dilemmas but also require agents to learn policies that implement their strategic intentions.

Multi-agent Reinforcement Learning reinforcement-learning +1

Learning to reinforcement learn

8 code implementations17 Nov 2016 Jane. X. Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z. Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, Matt Botvinick

We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL.

Meta-Learning Meta Reinforcement Learning +2

Reinforcement Learning with Unsupervised Auxiliary Tasks

3 code implementations16 Nov 2016 Max Jaderberg, Volodymyr Mnih, Wojciech Marian Czarnecki, Tom Schaul, Joel Z. Leibo, David Silver, Koray Kavukcuoglu

We also introduce a novel mechanism for focusing this representation upon extrinsic rewards, so that learning can rapidly adapt to the most relevant aspects of the actual task.

reinforcement-learning Reinforcement Learning (RL)

Using Fast Weights to Attend to the Recent Past

3 code implementations NeurIPS 2016 Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Z. Leibo, Catalin Ionescu

Until recently, research on artificial neural networks was largely restricted to systems with only two types of variable: Neural activities that represent the current or recent input and weights that learn to capture regularities among inputs, outputs and payoffs.

Model-Free Episodic Control

3 code implementations14 Jun 2016 Charles Blundell, Benigno Uria, Alexander Pritzel, Yazhe Li, Avraham Ruderman, Joel Z. Leibo, Jack Rae, Daan Wierstra, Demis Hassabis

State of the art deep reinforcement learning algorithms take many millions of interactions to attain human-level performance.

Decision Making Hippocampus +2

View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation

no code implementations5 Jun 2016 Joel Z. Leibo, Qianli Liao, Winrich Freiwald, Fabio Anselmi, Tomaso Poggio

The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving transformations like depth-rotations.

Face Recognition Object +1

Approximate Hubel-Wiesel Modules and the Data Structures of Neural Computation

no code implementations28 Dec 2015 Joel Z. Leibo, Julien Cornebise, Sergio Gómez, Demis Hassabis

This paper describes a framework for modeling the interface between perception and memory on the algorithmic level of analysis.

Hippocampus

How Important is Weight Symmetry in Backpropagation?

2 code implementations17 Oct 2015 Qianli Liao, Joel Z. Leibo, Tomaso Poggio

Gradient backpropagation (BP) requires symmetric feedforward and feedback connections -- the same weights must be used for forward and backward passes.

 Ranked #1 on Handwritten Digit Recognition on MNIST (PERCENTAGE ERROR metric)

Handwritten Digit Recognition Image Classification

Unsupervised learning of clutter-resistant visual representations from natural videos

no code implementations12 Sep 2014 Qianli Liao, Joel Z. Leibo, Tomaso Poggio

Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e. g., position, scale, viewing angle [1, 2, 3].

Face Recognition

Learning invariant representations and applications to face verification

no code implementations NeurIPS 2013 Qianli Liao, Joel Z. Leibo, Tomaso Poggio

Next, we apply the model to non-affine transformations: as expected, it performs well on face verification tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction.

Face Verification Object Recognition

Unsupervised Learning of Invariant Representations in Hierarchical Architectures

no code implementations17 Nov 2013 Fabio Anselmi, Joel Z. Leibo, Lorenzo Rosasco, Jim Mutch, Andrea Tacchetti, Tomaso Poggio

It also suggests that the main computational goal of the ventral stream of visual cortex is to provide a hierarchical representation of new objects/images which is invariant to transformations, stable, and discriminative for recognition---and that this representation may be continuously learned in an unsupervised way during development and visual experience.

Object Recognition speech-recognition +1

Can a biologically-plausible hierarchy effectively replace face detection, alignment, and recognition pipelines?

no code implementations16 Nov 2013 Qianli Liao, Joel Z. Leibo, Youssef Mroueh, Tomaso Poggio

The standard approach to unconstrained face recognition in natural photographs is via a detection, alignment, recognition pipeline.

Face Detection Face Recognition

Why The Brain Separates Face Recognition From Object Recognition

no code implementations NeurIPS 2011 Joel Z. Leibo, Jim Mutch, Tomaso Poggio

Many studies have uncovered evidence that visual cortex contains specialized regions involved in processing faces but not other object classes.

Face Identification Face Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.