1 code implementation • 24 Nov 2022 • John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo
Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios.
no code implementations • 25 Oct 2022 • Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih
We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.
no code implementations • 8 Apr 2022 • Allison C. Tam, Neil C. Rabinowitz, Andrew K. Lampinen, Nicholas A. Roy, Stephanie C. Y. Chan, DJ Strouse, Jane X. Wang, Andrea Banino, Felix Hill
We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments.
1 code implementation • NeurIPS 2021 • DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett
Here, we study the problem of how to train agents that collaborate well with human partners without using human data.
no code implementations • ICLR 2022 • DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen
However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective.
no code implementations • 11 Jul 2019 • Andrea Tacchetti, DJ Strouse, Marta Garnelo, Thore Graepel, Yoram Bachrach
From social networks to supply chains, more and more aspects of how humans, firms and organizations interact is mediated by artificial learning agents.
no code implementations • ICLR 2019 • Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine
In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.
no code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.
3 code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • NeurIPS 2018 • DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, David Schwab
We show how to optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning.
Multi-agent Reinforcement Learning
reinforcement-learning
+1
1 code implementation • 27 Dec 2017 • DJ Strouse, David J. Schwab
The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X, Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999).
2 code implementations • 1 Apr 2016 • DJ Strouse, David J. Schwab
Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck (DIB), that we argue better captures this notion of compression.