Search Results for author: Anca D. Dragan

Found 62 papers, 15 papers with code

Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

no code implementations16 Oct 2023 Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan

We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance.

Confronting Reward Model Overoptimization with Constrained RLHF

1 code implementation6 Oct 2023 Ted Moskovitz, Aaditya K. Singh, DJ Strouse, Tuomas Sandholm, Ruslan Salakhutdinov, Anca D. Dragan, Stephen Mcaleer

Large language models are typically aligned with human preferences by optimizing $\textit{reward models}$ (RMs) fitted to human feedback.

Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning

no code implementations7 Sep 2023 Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine

We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well.

Brain Computer Interface Decision Making +1

Contextual Reliability: When Different Features Matter in Different Contexts

no code implementations19 Jul 2023 Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, aditi raghunathan

We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context.

Aligning Robot and Human Representations

no code implementations3 Feb 2023 Andreea Bobu, Andi Peng, Pulkit Agrawal, Julie Shah, Anca D. Dragan

To act in the world, robots rely on a representation of salient task aspects: for example, to carry a coffee mug, a robot may consider movement efficiency or mug orientation in its behavior.

Imitation Learning Representation Learning

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

no code implementations3 Jan 2023 Daniel Shin, Anca D. Dragan, Daniel S. Brown

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.

Active Learning Offline RL

SIRL: Similarity-based Implicit Representation Learning

no code implementations2 Jan 2023 Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan

This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not.

Contrastive Learning Data Augmentation +1

Learning Representations that Enable Generalization in Assistive Tasks

no code implementations5 Dec 2022 Jerry Zhi-Yang He, aditi raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan

We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only.

The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

no code implementations23 Aug 2022 Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan

In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning.

Informativeness Vocal Bursts Type Prediction

First Contact: Unsupervised Human-Machine Co-Adaptation via Mutual Information Maximization

1 code implementation24 May 2022 Siddharth Reddy, Sergey Levine, Anca D. Dragan

How can we train an assistive human-machine interface (e. g., an electromyography-based limb prosthesis) to translate a user's raw command signals into the actions of a robot or computer when there is no prior mapping, we cannot ask the user for supervision in the form of action labels or reward feedback, and we do not have prior knowledge of the tasks the user is trying to accomplish?

Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

no code implementations13 Apr 2022 Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown

While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences.

Imitation Learning

Teaching Robots to Span the Space of Functional Expressive Motion

no code implementations4 Mar 2022 Arjun Sripathy, Andreea Bobu, Zhongyu Li, Koushil Sreenath, Daniel S. Brown, Anca D. Dragan

As a result 1) all user feedback can contribute to learning about every emotion; 2) the robot can generate trajectories for any emotion in the space instead of only a few predefined ones; and 3) the robot can respond emotively to user-generated natural language by mapping it to a target VAD.

ASHA: Assistive Teleoperation via Human-in-the-Loop Reinforcement Learning

no code implementations5 Feb 2022 Sean Chen, Jensen Gao, Siddharth Reddy, Glen Berseth, Anca D. Dragan, Sergey Levine

Building assistive interfaces for controlling robots through arbitrary, high-dimensional, noisy inputs (e. g., webcam images of eye gaze) can be challenging, especially when it involves inferring the user's desired action in the absence of a natural 'default' interface.

reinforcement-learning Reinforcement Learning (RL)

Inducing Structure in Reward Learning by Learning Features

1 code implementation18 Jan 2022 Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan

To get around this issue, recent deep Inverse Reinforcement Learning (IRL) methods learn rewards directly from the raw state but this is challenging because the robot has to implicitly learn the features that are important and how to combine them, simultaneously.

Assisted Robust Reward Design

no code implementations18 Nov 2021 Jerry Zhi-Yang He, Anca D. Dragan

We contribute an Assisted Reward Design method that speeds up the design process by anticipating and influencing this future evidence: rather than letting the designer eventually encounter failure cases and revise the reward then, the method actively exposes the designer to such environments during the development phase.

Autonomous Driving

Offline Preference-Based Apprenticeship Learning

no code implementations20 Jul 2021 Daniel Shin, Daniel S. Brown, Anca D. Dragan

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.

Active Learning Offline RL

Pragmatic Image Compression for Human-in-the-Loop Decision-Making

1 code implementation NeurIPS 2021 Siddharth Reddy, Anca D. Dragan, Sergey Levine

Standard lossy image compression algorithms aim to preserve an image's appearance, while minimizing the number of bits needed to transmit it.

Car Racing Decision Making +1

Physical Interaction as Communication: Learning Robot Objectives Online from Human Corrections

no code implementations6 Jul 2021 Dylan P. Losey, Andrea Bajcsy, Marcia K. O'Malley, Anca D. Dragan

We recognize that physical human-robot interaction (pHRI) is often intentional -- the human intervenes on purpose because the robot is not doing the task correctly.

Policy Gradient Bayesian Robust Optimization for Imitation Learning

no code implementations11 Jun 2021 Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

Imitation Learning

Preference learning along multiple criteria: A game-theoretic perspective

no code implementations NeurIPS 2020 Kush Bhatia, Ashwin Pananjady, Peter L. Bartlett, Anca D. Dragan, Martin J. Wainwright

Finally, we showcase the practical utility of our framework in a user study on autonomous driving, where we find that the Blackwell winner outperforms the von Neumann winner for the overall preferences.

Autonomous Driving

Optimal Cost Design for Model Predictive Control

1 code implementation23 Apr 2021 Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan

We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, incorrect dynamics models, and local minima issues.

Autonomous Driving Model Predictive Control

Agnostic learning with unknown utilities

no code implementations17 Apr 2021 Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt

This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility $u^*$ might not be possible from finitely many samples.

Situational Confidence Assistance for Lifelong Shared Autonomy

no code implementations14 Apr 2021 Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

Shared autonomy enables robots to infer user intent and assist in accomplishing it.

Dynamically Switching Human Prediction Models for Efficient Planning

no code implementations13 Mar 2021 Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

As environments involving both robots and humans become increasingly common, so does the need to account for people during planning.

Analyzing Human Models that Adapt Online

no code implementations9 Mar 2021 Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan

This enables us to leverage tools from reachability analysis and optimal control to compute the set of hypotheses the robot could learn in finite time, as well as the worst and best-case time it takes to learn them.

Autonomous Driving

On complementing end-to-end human behavior predictors with planning

no code implementations9 Mar 2021 Liting Sun, Xiaogang Jia, Anca D. Dragan

High capacity end-to-end approaches for human motion (behavior) prediction have the ability to represent subtle nuances in human behavior, but struggle with robustness to out of distribution inputs and tail events.

Autonomous Driving Human motion prediction +2

Value Alignment Verification

1 code implementation2 Dec 2020 Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values.

Autonomous Driving

Assisted Perception: Optimizing Observations to Communicate State

1 code implementation6 Aug 2020 Siddharth Reddy, Sergey Levine, Anca D. Dragan

We evaluate ASE in a user study with 12 participants who each perform four tasks: two tasks with known user biases -- bandwidth-limited image classification and a driving video game with observation delay -- and two with unknown biases that our method has to learn -- guided 2D navigation and a lunar lander teleoperation video game.

Image Classification

Feature Expansive Reward Learning: Rethinking Human Input

1 code implementation23 Jun 2020 Andreea Bobu, Marius Wiggert, Claire Tomlin, Anca D. Dragan

When the correction cannot be explained by these features, recent work in deep Inverse Reinforcement Learning (IRL) suggests that the robot could ask for task demonstrations and recover a reward defined over the raw state space.

Reward-rational (implicit) choice: A unifying formalism for reward learning

no code implementations NeurIPS 2020 Hong Jun Jeon, Smitha Milli, Anca D. Dragan

It is often difficult to hand-specify what the correct reward function is for a task, so researchers have instead aimed to learn reward functions from human behavior or feedback.

Quantifying Hypothesis Space Misspecification in Learning from Human-Robot Demonstrations and Physical Corrections

no code implementations3 Feb 2020 Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Sampada Deglurkar, Anca D. Dragan

Recent work focuses on how robots can use such input - like demonstrations or corrections - to learn intended objectives.

LESS is More: Rethinking Probabilistic Models of Human Behavior

no code implementations13 Jan 2020 Andreea Bobu, Dexter R. R. Scobee, Jaime F. Fisac, S. Shankar Sastry, Anca D. Dragan

A common model is the Boltzmann noisily-rational decision model, which assumes people approximately optimize a reward function and choose trajectories in proportion to their exponentiated reward.

Econometrics

Learning Human Objectives by Evaluating Hypothetical Behavior

1 code implementation ICML 2020 Siddharth Reddy, Anca D. Dragan, Sergey Levine, Shane Legg, Jan Leike

To address this challenge, we propose an algorithm that safely and interactively learns a model of the user's reward function.

Car Racing

Nonverbal Robot Feedback for Human Teachers

no code implementations6 Nov 2019 Sandy H. Huang, Isabella Huang, Ravi Pandya, Anca D. Dragan

Robots can learn preferences from human demonstrations, but their success depends on how informative these demonstrations are.

A Hamilton-Jacobi Reachability-Based Framework for Predicting and Analyzing Human Motion for Safe Planning

no code implementations29 Oct 2019 Somil Bansal, Andrea Bajcsy, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin

We construct a new continuous-time dynamical system, where the inputs are the observations of human behavior, and the dynamics include how the belief over the model parameters change.

Bayesian Inference Human motion prediction +1

Scaled Autonomy: Enabling Human Operators to Control Robot Fleets

no code implementations22 Sep 2019 Gokul Swamy, Siddharth Reddy, Sergey Levine, Anca D. Dragan

We learn a model of the user's preferences from observations of the user's choices in easy settings with a few robots, and use it in challenging settings with more robots to automatically identify which robot the user would most likely choose to control, if they were able to evaluate the states of all robots at all times.

Robot Navigation

Efficient Iterative Linear-Quadratic Approximations for Nonlinear Multi-Player General-Sum Differential Games

1 code implementation10 Sep 2019 David Fridovich-Keil, Ellis Ratner, Anca D. Dragan, Claire J. Tomlin

We benchmark our method in a three-player general-sum simulated example, in which it takes < 0. 75 s to identify a solution and < 50 ms to solve warm-started subproblems in a receding horizon.

Systems and Control Robotics Systems and Control

Bayesian Robustness: A Nonasymptotic Viewpoint

no code implementations27 Jul 2019 Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael. I. Jordan

We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers.

Binary Classification regression

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

no code implementations23 Jun 2019 Rohin Shah, Noah Gundotra, Pieter Abbeel, Anca D. Dragan

But in the era of deep learning, a natural suggestion researchers make is to avoid mathematical models of human behavior that are fraught with specific assumptions, and instead use a purely data-driven approach.

An Extensible Interactive Interface for Agent Design

no code implementations6 Jun 2019 Matthew Rahtz, James Fang, Anca D. Dragan, Dylan Hadfield-Menell

In deep reinforcement learning, for example, directly specifying a reward as a function of a high-dimensional observation is challenging.

Reinforcement Learning (RL)

SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards

5 code implementations ICLR 2020 Siddharth Reddy, Anca D. Dragan, Sergey Levine

Theoretically, we show that SQIL can be interpreted as a regularized variant of BC that uses a sparsity prior to encourage long-horizon imitation.

Imitation Learning Q-Learning +2

Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning

no code implementations9 Mar 2019 Smitha Milli, Anca D. Dragan

In this work, we focus on misspecification: we argue that robots might not know whether people are being pedagogic or literal and that it is important to ask which assumption is safer to make.

Human-AI Learning Performance in Multi-Armed Bandits

no code implementations21 Dec 2018 Ravi Pandya, Sandy H. Huang, Dylan Hadfield-Menell, Anca D. Dragan

People frequently face challenging decision-making problems in which outcomes are uncertain or unknown.

Decision Making Multi-Armed Bandits

Establishing Appropriate Trust via Critical States

no code implementations18 Oct 2018 Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan

In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts.

Robotics

Hierarchical Game-Theoretic Planning for Autonomous Vehicles

no code implementations13 Oct 2018 Jaime F. Fisac, Eli Bronstein, Elis Stefansson, Dorsa Sadigh, S. Shankar Sastry, Anca D. Dragan

This mutual dependence, best captured by dynamic game theory, creates a strong coupling between the vehicle's planning and its predictions of other drivers' behavior, and constitutes an open problem with direct implications on the safety and viability of autonomous driving technology.

Autonomous Driving Decision Making +1

Learning under Misspecified Objective Spaces

1 code implementation11 Oct 2018 Andreea Bobu, Andrea Bajcsy, Jaime F. Fisac, Anca D. Dragan

Learning robot objective functions from human input has become increasingly important, but state-of-the-art techniques assume that the human's desired objective lies within the robot's hypothesis space.

What Would pi* Do?: Imitation Learning via Off-Policy Reinforcement Learning

no code implementations27 Sep 2018 Siddharth Reddy, Anca D. Dragan, Sergey Levine

Learning to imitate expert actions given demonstrations containing image observations is a difficult problem in robotic control.

Imitation Learning Q-Learning +2

Cost Functions for Robot Motion Style

1 code implementation1 Sep 2018 Allan Zhou, Anca D. Dragan

We focus on autonomously generating robot motion for day to day physical tasks that is expressive of a certain style or emotion.

Robotics

The Social Cost of Strategic Classification

no code implementations25 Aug 2018 Smitha Milli, John Miller, Anca D. Dragan, Moritz Hardt

Consequential decision-making typically incentivizes individuals to behave strategically, tailoring their behavior to the specifics of the decision rule.

Classification Decision Making +2

Courteous Autonomous Cars

no code implementations8 Aug 2018 Liting Sun, Wei Zhan, Masayoshi Tomizuka, Anca D. Dragan

Such a courtesy term enables the robot car to be aware of possible irrationality of the human behavior, and plan accordingly.

Model Reconstruction from Model Explanations

no code implementations13 Jul 2018 Smitha Milli, Ludwig Schmidt, Anca D. Dragan, Moritz Hardt

We show through theory and experiment that gradient-based explanations of a model quickly reveal the model itself.

An Efficient, Generalized Bellman Update For Cooperative Inverse Reinforcement Learning

no code implementations ICML 2018 Dhruv Malik, Malayandi Palaniappan, Jaime F. Fisac, Dylan Hadfield-Menell, Stuart Russell, Anca D. Dragan

We apply this update to a variety of POMDP solvers and find that it enables us to scale CIRL to non-trivial problems, with larger reward parameter spaces, and larger action spaces for both robot and human.

reinforcement-learning Reinforcement Learning (RL)

Simplifying Reward Design through Divide-and-Conquer

no code implementations7 Jun 2018 Ellis Ratner, Dylan Hadfield-Menell, Anca D. Dragan

Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be challenging and frustrating.

Motion Planning

Where Do You Think You're Going?: Inferring Beliefs about Dynamics from Behavior

1 code implementation NeurIPS 2018 Siddharth Reddy, Anca D. Dragan, Sergey Levine

Inferring intent from observed behavior has been studied extensively within the frameworks of Bayesian inverse planning and inverse reinforcement learning.

reinforcement-learning Reinforcement Learning (RL)

Generating Plans that Predict Themselves

no code implementations14 Feb 2018 Jaime F. Fisac, Chang Liu, Jessica B. Hamrick, S. Shankar Sastry, J. Karl Hedrick, Thomas L. Griffiths, Anca D. Dragan

We introduce $t$-\ACty{}: a measure that quantifies the accuracy and confidence with which human observers can predict the remaining robot plan from the overall task goal and the observed initial $t$ actions in the plan.

Shared Autonomy via Deep Reinforcement Learning

1 code implementation6 Feb 2018 Siddharth Reddy, Anca D. Dragan, Sergey Levine

In shared autonomy, user input is combined with semi-autonomous control to achieve a common goal.

reinforcement-learning Reinforcement Learning (RL)

Pragmatic-Pedagogic Value Alignment

no code implementations20 Jul 2017 Jaime F. Fisac, Monica A. Gates, Jessica B. Hamrick, Chang Liu, Dylan Hadfield-Menell, Malayandi Palaniappan, Dhruv Malik, S. Shankar Sastry, Thomas L. Griffiths, Anca D. Dragan

In robotics, value alignment is key to the design of collaborative robots that can integrate into human workflows, successfully inferring and adapting to their users' objectives as they go.

Decision Making

Enabling Robots to Communicate their Objectives

no code implementations11 Feb 2017 Sandy H. Huang, David Held, Pieter Abbeel, Anca D. Dragan

We show that certain approximate-inference models lead to the robot generating example behaviors that better enable users to anticipate what it will do in novel situations.

Autonomous Driving

Cannot find the paper you are looking for? You can Submit a new open access paper.