Search Results for author: DJ Strouse

Found 14 papers, 9 papers with code

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

1 code implementation • 22 Feb 2024 • Aaditya K. Singh, DJ Strouse

Tokenization, the division of input text into input tokens, is an often overlooked aspect of the large language model (LLM) pipeline and could be the source of useful or harmful inductive biases.

Inductive Bias Language Modelling +1

Paper
Code

Confronting Reward Model Overoptimization with Constrained RLHF

1 code implementation • 6 Oct 2023 • Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer

Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.

Paper
Code

Melting Pot 2.0

2 code implementations • 24 Nov 2022 • John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo

Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios.

Artificial Life Navigate

529

Paper
Code

In-context Reinforcement Learning with Algorithm Distillation

1 code implementation • 25 Oct 2022 • Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.

reinforcement-learning

Paper
Code

Semantic Exploration from Language Abstractions and Pretrained Representations

no code implementations • 8 Apr 2022 • Allison C. Tam, Neil C. Rabinowitz, Andrew K. Lampinen, Nicholas A. Roy, Stephanie C. Y. Chan, DJ Strouse, Jane X. Wang, Andrea Banino, Felix Hill

We show that these pretrained representations drive meaningful, task-relevant exploration and improve performance on 3D simulated environments.

Image Captioning Reinforcement Learning (RL)

Paper
Add Code

Collaborating with Humans without Human Data

1 code implementation • NeurIPS 2021 • DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett

Here, we study the problem of how to train agents that collaborate well with human partners without using human data.

Multi-agent Reinforcement Learning

Paper
Code

Learning more skills through optimistic exploration

no code implementations • ICLR 2022 • DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen

However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective.

Paper
Add Code

Learning Truthful, Efficient, and Welfare Maximizing Auction Rules

no code implementations • 11 Jul 2019 • Andrea Tacchetti, DJ Strouse, Marta Garnelo, Thore Graepel, Yoram Bachrach

From social networks to supply chains, more and more aspects of how humans, firms and organizations interact is mediated by artificial learning agents.

Paper
Add Code

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

counterfactual Counterfactual Reasoning +2

Paper
Add Code

Transfer and Exploration via the Information Bottleneck

no code implementations • ICLR 2019 • Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine

In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

Paper
Add Code

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations • ICLR 2019 • Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

counterfactual Counterfactual Reasoning +3

372

Paper
Code

Learning to Share and Hide Intentions using Information Regularization

1 code implementation • NeurIPS 2018 • DJ Strouse, Max Kleiman-Weiner, Josh Tenenbaum, Matt Botvinick, David Schwab

We show how to optimize these regularizers in a way that is easy to integrate with policy gradient reinforcement learning.

Multi-agent Reinforcement Learning reinforcement-learning +1

Paper
Code

The information bottleneck and geometric clustering

1 code implementation • 27 Dec 2017 • DJ Strouse, David J. Schwab

The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X, Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999).

Clustering Model Selection

116

Paper
Code

The deterministic information bottleneck

2 code implementations • 1 Apr 2016 • DJ Strouse, David J. Schwab

Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck (DIB), that we argue better captures this notion of compression.

Clustering Computational Efficiency

116

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.