Search Results for author: DJ Strouse

Found 14 papers, 9 papers with code

Tokenization counts: the impact of tokenization on arithmetic in frontier LLMs

1 code implementation22 Feb 2024 Aaditya K. Singh, DJ Strouse

Tokenization, the division of input text into input tokens, is an often overlooked aspect of the large language model (LLM) pipeline and could be the source of useful or harmful inductive biases.

Inductive Bias Language Modelling +1

Confronting Reward Model Overoptimization with Constrained RLHF

1 code implementation6 Oct 2023 Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer

Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.

Melting Pot 2.0

2 code implementations24 Nov 2022 John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael B. Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo

Melting Pot is a research tool developed to facilitate work on multi-agent artificial intelligence, and provides an evaluation protocol that measures generalization to novel social partners in a set of canonical test scenarios.

Artificial Life Navigate

In-context Reinforcement Learning with Algorithm Distillation

1 code implementation25 Oct 2022 Michael Laskin, Luyu Wang, Junhyuk Oh, Emilio Parisotto, Stephen Spencer, Richie Steigerwald, DJ Strouse, Steven Hansen, Angelos Filos, Ethan Brooks, Maxime Gazeau, Himanshu Sahni, Satinder Singh, Volodymyr Mnih

We propose Algorithm Distillation (AD), a method for distilling reinforcement learning (RL) algorithms into neural networks by modeling their training histories with a causal sequence model.

reinforcement-learning

Collaborating with Humans without Human Data

1 code implementation NeurIPS 2021 DJ Strouse, Kevin R. McKee, Matt Botvinick, Edward Hughes, Richard Everett

Here, we study the problem of how to train agents that collaborate well with human partners without using human data.

Multi-agent Reinforcement Learning

Learning more skills through optimistic exploration

no code implementations ICLR 2022 DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen

However, an inherent exploration problem lingers: when a novel state is actually encountered, the discriminator will necessarily not have seen enough training data to produce accurate and confident skill classifications, leading to low intrinsic reward for the agent and effective penalization of the sort of exploration needed to actually maximize the objective.

Learning Truthful, Efficient, and Welfare Maximizing Auction Rules

no code implementations11 Jul 2019 Andrea Tacchetti, DJ Strouse, Marta Garnelo, Thore Graepel, Yoram Bachrach

From social networks to supply chains, more and more aspects of how humans, firms and organizations interact is mediated by artificial learning agents.

Intrinsic Social Motivation via Causal Influence in Multi-Agent RL

no code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Therefore, we also employ influence to train agents to use an explicit communication channel, and find that it leads to more effective communication and higher collective reward.

counterfactual Counterfactual Reasoning +2

Transfer and Exploration via the Information Bottleneck

no code implementations ICLR 2019 Anirudh Goyal, Riashat Islam, DJ Strouse, Zafarali Ahmed, Hugo Larochelle, Matthew Botvinick, Yoshua Bengio, Sergey Levine

In new environments, this model can then identify novel subgoals for further exploration, guiding the agent through a sequence of potential decision states and through new regions of the state space.

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning

3 code implementations ICLR 2019 Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

We propose a unified mechanism for achieving coordination and communication in Multi-Agent Reinforcement Learning (MARL), through rewarding agents for having causal influence over other agents' actions.

counterfactual Counterfactual Reasoning +3

The information bottleneck and geometric clustering

1 code implementation27 Dec 2017 DJ Strouse, David J. Schwab

The information bottleneck (IB) approach to clustering takes a joint distribution $P\!\left(X, Y\right)$ and maps the data $X$ to cluster labels $T$ which retain maximal information about $Y$ (Tishby et al., 1999).

Clustering Model Selection

The deterministic information bottleneck

2 code implementations1 Apr 2016 DJ Strouse, David J. Schwab

Here, we introduce an alternative formulation that replaces mutual information with entropy, which we call the deterministic information bottleneck (DIB), that we argue better captures this notion of compression.

Clustering Computational Efficiency

Cannot find the paper you are looking for? You can Submit a new open access paper.