Search Results for author: Micah Carroll

Found 11 papers, 2 papers with code

Beyond Preferences in AI Alignment

no code implementations30 Aug 2024 Tan Zhi-Xuan, Micah Carroll, Matija Franklin, Hal Ashton

We first survey the limits of rational choice theory as a descriptive model, explaining how preferences fail to capture the thick semantic content of human values, and how utility representations neglect the possible incommensurability of those values.

Descriptive

AI Alignment with Changing and Influenceable Reward Functions

no code implementations28 May 2024 Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan

Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves.

Who Needs to Know? Minimal Knowledge for Optimal Coordination

no code implementations15 Jun 2023 Niklas Lauffer, Ameesh Shah, Micah Carroll, Michael Dennis, Stuart Russell

We apply this algorithm to analyze the strategically relevant information for tasks in both a standard and a partially observable version of the Overcooked environment.

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

no code implementations30 Nov 2022 David Zhang, Micah Carroll, Andreea Bobu, Anca Dragan

One of the most successful paradigms for reward learning uses human feedback in the form of comparisons.

Dimensionality Reduction

UniMASK: Unified Inference in Sequential Decision Problems

1 code implementation20 Nov 2022 Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.

Decision Making

Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration

no code implementations3 Nov 2022 Mesut Yang, Micah Carroll, Anca Dragan

We show that using optimal behavior as a prior for human models makes these models vastly more data-efficient and able to generalize to new environments.

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

no code implementations25 Apr 2022 Micah Carroll, Anca Dragan, Stuart Russell, Dylan Hadfield-Menell

These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe".

Recommendation Systems

Evaluating the Robustness of Collaborative Agents

no code implementations14 Jan 2021 Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

We apply this methodology to build a suite of unit tests for the Overcooked-AI environment, and use this test suite to evaluate three proposals for improving robustness.

On the Utility of Learning about Humans for Human-AI Coordination

2 code implementations NeurIPS 2019 Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.

Cannot find the paper you are looking for? You can Submit a new open access paper.