Search Results for author: Anca Dragan

Found 47 papers, 19 papers with code

A Generalized Acquisition Function for Preference-based Reward Learning

no code implementations9 Mar 2024 Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca Dragan, Erdem Biyik

Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.

Preventing Reward Hacking with Occupancy Measure Regularization

1 code implementation5 Mar 2024 Cassidy Laidlaw, Shivam Singhal, Anca Dragan

Thus, we propose regularizing based on the OM divergence between policies instead of AD divergence to prevent reward hacking.

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

no code implementations27 Feb 2024 Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.

The Effective Horizon Explains Deep RL Performance in Stochastic Environments

1 code implementation13 Dec 2023 Cassidy Laidlaw, Banghua Zhu, Stuart Russell, Anca Dragan

Our goal is to explain why deep RL algorithms often perform well in practice, despite using random exploration and much more expressive function classes like neural networks.

Reinforcement Learning (RL)

Zero-Shot Goal-Directed Dialogue via RL on Imagined Conversations

no code implementations9 Nov 2023 Joey Hong, Sergey Levine, Anca Dragan

LLMs trained with supervised fine-tuning or "single-step" RL, as with standard RLHF, might struggle which tasks that require such goal-directed behavior, since they are not trained to optimize for overall conversational outcomes after multiple turns of interaction.

Text Generation

Offline RL with Observation Histories: Analyzing and Improving Sample Complexity

no code implementations31 Oct 2023 Joey Hong, Anca Dragan, Sergey Levine

Theoretically, we show that standard offline RL algorithms conditioned on observation histories suffer from poor sample complexity, in accordance with the above intuition.

Autonomous Navigation Offline RL +1

Learning Optimal Advantage from Preferences and Mistaking it for Reward

1 code implementation3 Oct 2023 W. Bradley Knox, Stephane Hatgis-Kessell, Sigurdur Orn Adalgeirsson, Serena Booth, Anca Dragan, Peter Stone, Scott Niekum

Most recent work assumes that human preferences are generated based only upon the reward accrued within those segments, or their partial return.

Learning to Model the World with Language

no code implementations31 Jul 2023 Jessy Lin, Yuqing Du, Olivia Watkins, Danijar Hafner, Pieter Abbeel, Dan Klein, Anca Dragan

To interact with humans in the world, agents need to understand the diverse types of language that people use, relate them to the visual world, and act based on them.

Future prediction General Knowledge +1

Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control

no code implementations30 Jun 2023 Vivek Myers, Andre He, Kuan Fang, Homer Walke, Philippe Hansen-Estruch, Ching-An Cheng, Mihai Jalobeanu, Andrey Kolobov, Anca Dragan, Sergey Levine

Our method achieves robust performance in the real world by learning an embedding from the labeled data that aligns language not to the goal image, but rather to the desired change between the start and goal images that the instruction corresponds to.

Instruction Following

Toward Grounded Commonsense Reasoning

no code implementations14 Jun 2023 Minae Kwon, Hengyuan Hu, Vivek Myers, Siddharth Karamcheti, Anca Dragan, Dorsa Sadigh

We additionally illustrate our approach with a robot on 2 carefully designed surfaces.

Language Modelling

Bridging RL Theory and Practice with the Effective Horizon

1 code implementation NeurIPS 2023 Cassidy Laidlaw, Stuart Russell, Anca Dragan

Using BRIDGE, we find that prior bounds do not correlate well with when deep RL succeeds vs. fails, but discover a surprising property that does.

Reinforcement Learning (RL)

Automatically Auditing Large Language Models via Discrete Optimization

1 code implementation8 Mar 2023 Erik Jones, Anca Dragan, aditi raghunathan, Jacob Steinhardt

Auditing large language models for unexpected behaviors is critical to preempt catastrophic deployments, yet remains challenging.

Towards Modeling and Influencing the Dynamics of Human Learning

no code implementations2 Jan 2023 Ran Tian, Masayoshi Tomizuka, Anca Dragan, Andrea Bajcsy

Interestingly, robot actions influence what this experience is, and therefore influence how people's internal models change.

On the Sensitivity of Reward Inference to Misspecified Human Models

no code implementations9 Dec 2022 Joey Hong, Kush Bhatia, Anca Dragan

This begs the question: how accurate do these models need to be in order for the reward inference to be accurate?

Continuous Control

Time-Efficient Reward Learning via Visually Assisted Cluster Ranking

no code implementations30 Nov 2022 David Zhang, Micah Carroll, Andreea Bobu, Anca Dragan

One of the most successful paradigms for reward learning uses human feedback in the form of comparisons.

Dimensionality Reduction

UniMASK: Unified Inference in Sequential Decision Problems

1 code implementation20 Nov 2022 Micah Carroll, Orr Paradise, Jessy Lin, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Randomly masking and predicting word tokens has been a successful approach in pre-training language models for a variety of downstream tasks.

Decision Making

Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration

no code implementations3 Nov 2022 Mesut Yang, Micah Carroll, Anca Dragan

We show that using optimal behavior as a prior for human models makes these models vastly more data-efficient and able to generalize to new environments.

Estimating and Penalizing Induced Preference Shifts in Recommender Systems

no code implementations25 Apr 2022 Micah Carroll, Anca Dragan, Stuart Russell, Dylan Hadfield-Menell

These steps involve two challenging ingredients: estimation requires anticipating how hypothetical algorithms would influence user preferences if deployed - we do this by using historical user interaction data to train a predictive user model which implicitly contains their preference dynamics; evaluation and optimization additionally require metrics to assess whether such influences are manipulative or otherwise unwanted - we use the notion of "safe shifts", that define a trust region within which behavior is safe: for instance, the natural way in which users would shift without interference from the system could be deemed "safe".

Recommendation Systems

The Boltzmann Policy Distribution: Accounting for Systematic Suboptimality in Human Models

1 code implementation ICLR 2022 Cassidy Laidlaw, Anca Dragan

However, these models fail when humans exhibit systematic suboptimality, i. e. when their deviations from optimal behavior are not independent, but instead consistent over time.

Bayesian Inference Imitation Learning

Inferring Rewards from Language in Context

1 code implementation ACL 2022 Jessy Lin, Daniel Fried, Dan Klein, Anca Dragan

In classic instruction following, language like "I'd like the JetBlue flight" maps to actions (e. g., selecting that flight).

Instruction Following Reinforcement Learning (RL)

Human irrationality: both bad and good for reward inference

no code implementations12 Nov 2021 Lawrence Chan, Andrew Critch, Anca Dragan

More importantly, we show that an irrational human, when correctly modelled, can communicate more information about the reward than a perfectly rational human can.

B-Pref: Benchmarking Preference-Based Reinforcement Learning

1 code implementation4 Nov 2021 Kimin Lee, Laura Smith, Anca Dragan, Pieter Abbeel

However, it is difficult to quantify the progress in preference-based RL due to the lack of a commonly adopted benchmark.

Benchmarking reinforcement-learning +1

The MineRL BASALT Competition on Learning from Human Feedback

no code implementations5 Jul 2021 Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Rather than training AI systems using a predefined reward function or using a labeled dataset with a predefined set of categories, we instead train the AI system using a learning signal derived from some form of human feedback, which can evolve over time as the understanding of the task changes, or as the capabilities of the AI system improve.

Imitation Learning

Learning What To Do by Simulating the Past

1 code implementation ICLR 2021 David Lindner, Rohin Shah, Pieter Abbeel, Anca Dragan

Since reward functions are hard to specify, recent work has focused on learning policies from human feedback.

Choice Set Misspecification in Reward Inference

no code implementations19 Jan 2021 Rachel Freedman, Rohin Shah, Anca Dragan

A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback, like demonstrations or corrections.

Benefits of Assistance over Reward Learning

no code implementations1 Jan 2021 Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

The impacts of known and unknown demonstrator irrationality on reward inference

no code implementations1 Jan 2021 Lawrence Chan, Andrew Critch, Anca Dragan

Surprisingly, we find that if we give the learner access to the correct model of the demonstrator's irrationality, these irrationalities can actually help reward inference.

AvE: Assistance via Empowerment

1 code implementation NeurIPS 2020 Yuqing Du, Stas Tiomkin, Emre Kiciman, Daniel Polani, Pieter Abbeel, Anca Dragan

One difficulty in using artificial agents for human-assistive applications lies in the challenge of accurately assisting with a person's goal(s).

On the Utility of Learning about Humans for Human-AI Coordination

2 code implementations NeurIPS 2019 Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

While we would like agents that can coordinate with humans, current algorithms such as self-play and population-based training create agents that can coordinate with themselves.

Few-Shot Intent Inference via Meta-Inverse Reinforcement Learning

no code implementations ICLR 2019 Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn

A significant challenge for the practical application of reinforcement learning toreal world problems is the need to specify an oracle reward function that correctly defines a task.

reinforcement-learning Reinforcement Learning (RL)

Preferences Implicit in the State of the World

1 code implementation ICLR 2019 Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

We find that information from the initial state can be used to infer both side effects that should be avoided as well as preferences for how the environment should be organized.

Reinforcement Learning (RL)

The Assistive Multi-Armed Bandit

1 code implementation24 Jan 2019 Lawrence Chan, Dylan Hadfield-Menell, Siddhartha Srinivasa, Anca Dragan

Learning preferences implicit in the choices humans make is a well studied problem in both economics and computer science.

Multi-Armed Bandits

On the Utility of Model Learning in HRI

no code implementations4 Jan 2019 Gokul Swamy, Jens Schulz, Rohan Choudhury, Dylan Hadfield-Menell, Anca Dragan

Fundamental to robotics is the debate between model-based and model-free learning: should the robot build an explicit model of the world, or learn a policy directly?

Autonomous Driving

Learning a Prior over Intent via Meta-Inverse Reinforcement Learning

no code implementations31 May 2018 Kelvin Xu, Ellis Ratner, Anca Dragan, Sergey Levine, Chelsea Finn

A significant challenge for the practical application of reinforcement learning in the real world is the need to specify an oracle reward function that correctly defines a task.

reinforcement-learning Reinforcement Learning (RL)

Inverse Reward Design

1 code implementation NeurIPS 2017 Dylan Hadfield-Menell, Smitha Milli, Pieter Abbeel, Stuart Russell, Anca Dragan

When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios.

Should Robots be Obedient?

1 code implementation28 May 2017 Smitha Milli, Dylan Hadfield-Menell, Anca Dragan, Stuart Russell

We show that when a human is not perfectly rational then a robot that tries to infer and act according to the human's underlying preferences can always perform better than a robot that simply follows the human's literal order.

Translating Neuralese

1 code implementation ACL 2017 Jacob Andreas, Anca Dragan, Dan Klein

Several approaches have recently been proposed for learning decentralized deep multiagent policies that coordinate via a differentiable communication channel.

Machine Translation Translation

DART: Noise Injection for Robust Imitation Learning

2 code implementations27 Mar 2017 Michael Laskey, Jonathan Lee, Roy Fox, Anca Dragan, Ken Goldberg

One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy.

Imitation Learning

The Off-Switch Game

no code implementations24 Nov 2016 Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

We analyze a simple game between a human H and a robot R, where H can press R's off switch but R can disable the off switch.

Comparing Human-Centric and Robot-Centric Sampling for Robot Deep Learning from Demonstrations

no code implementations4 Oct 2016 Michael Laskey, Caleb Chuck, Jonathan Lee, Jeffrey Mahler, Sanjay Krishnan, Kevin Jamieson, Anca Dragan, Ken Goldberg

Although policies learned with RC sampling can be superior to HC sampling for standard learning models such as linear SVMs, policies learned with HC sampling may be comparable with highly-expressive learning models such as deep learning and hyper-parametric decision trees, which have little model error.

Cooperative Inverse Reinforcement Learning

2 code implementations NeurIPS 2016 Dylan Hadfield-Menell, Anca Dragan, Pieter Abbeel, Stuart Russell

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans.

Active Learning reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.