Search Results for author: Daniel S. Brown

Found 33 papers, 10 papers with code

Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

no code implementations10 Apr 2024 Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown

This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations.

Imitation Learning Reinforcement Learning (RL)

Exploring Behavior Discovery Methods for Heterogeneous Swarms of Limited-Capability Robots

no code implementations25 Oct 2023 Connor Mattson, Jeremy C. Clark, Daniel S. Brown

We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities.


Quantifying Assistive Robustness Via the Natural-Adversarial Frontier

no code implementations16 Oct 2023 Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan

We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance.

Contextual Reliability: When Different Features Matter in Different Contexts

no code implementations19 Jul 2023 Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, aditi raghunathan

We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context.

Can Differentiable Decision Trees Learn Interpretable Reward Functions?

no code implementations22 Jun 2023 Akansha Kalra, Daniel S. Brown

There is an increasing interest in learning reward functions that model human preferences.

Atari Games

Leveraging Human Feedback to Evolve and Discover Novel Emergent Behaviors in Robot Swarms

no code implementations25 Apr 2023 Connor Mattson, Daniel S. Brown

We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors.

Self-Supervised Learning

Efficient Preference-Based Reinforcement Learning Using Learned Dynamics Models

no code implementations11 Jan 2023 Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown

In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction.

reinforcement-learning Reinforcement Learning (RL)

Benchmarks and Algorithms for Offline Preference-Based Reward Learning

no code implementations3 Jan 2023 Daniel Shin, Anca D. Dragan, Daniel S. Brown

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.

Active Learning Offline RL

SIRL: Similarity-based Implicit Representation Learning

no code implementations2 Jan 2023 Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan

This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not.

Contrastive Learning Data Augmentation +1

Learning Representations that Enable Generalization in Assistive Tasks

no code implementations5 Dec 2022 Jerry Zhi-Yang He, aditi raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan

We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only.

Autonomous Assessment of Demonstration Sufficiency via Bayesian Inverse Reinforcement Learning

no code implementations28 Nov 2022 Tu Trinh, Haoyu Chen, Daniel S. Brown

We evaluate our approach in simulation for both discrete and continuous state-space domains and illustrate the feasibility of developing a robotic system that can accurately evaluate demonstration sufficiency.

Active Learning reinforcement-learning +1

Monte Carlo Augmented Actor-Critic for Sparse Reward Deep Reinforcement Learning from Suboptimal Demonstrations

no code implementations14 Oct 2022 Albert Wilcox, Ashwin Balakrishna, Jules Dedieu, Wyame Benslimane, Daniel S. Brown, Ken Goldberg

Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions.

Continuous Control

The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types

no code implementations23 Aug 2022 Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan

In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning.

Informativeness Vocal Bursts Type Prediction

Causal Confusion and Reward Misidentification in Preference-Based Reward Learning

no code implementations13 Apr 2022 Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown

While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences.

Imitation Learning

Teaching Robots to Span the Space of Functional Expressive Motion

no code implementations4 Mar 2022 Arjun Sripathy, Andreea Bobu, Zhongyu Li, Koushil Sreenath, Daniel S. Brown, Anca D. Dragan

As a result 1) all user feedback can contribute to learning about every emotion; 2) the robot can generate trajectories for any emotion in the space instead of only a few predefined ones; and 3) the robot can respond emotively to user-generated natural language by mapping it to a target VAD.

ThriftyDAgger: Budget-Aware Novelty and Risk Gating for Interactive Imitation Learning

no code implementations17 Sep 2021 Ryan Hoque, Ashwin Balakrishna, Ellen Novoseller, Albert Wilcox, Daniel S. Brown, Ken Goldberg

Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden on the human supervisor?

Imitation Learning

Offline Preference-Based Apprenticeship Learning

no code implementations20 Jul 2021 Daniel Shin, Daniel S. Brown, Anca D. Dragan

Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.

Active Learning Offline RL

Policy Gradient Bayesian Robust Optimization for Imitation Learning

no code implementations11 Jun 2021 Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg

Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.

Imitation Learning

Optimal Cost Design for Model Predictive Control

1 code implementation23 Apr 2021 Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan

We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, incorrect dynamics models, and local minima issues.

Autonomous Driving Model Predictive Control

Situational Confidence Assistance for Lifelong Shared Autonomy

no code implementations14 Apr 2021 Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

Shared autonomy enables robots to infer user intent and assist in accomplishing it.

LazyDAgger: Reducing Context Switching in Interactive Imitation Learning

no code implementations31 Mar 2021 Ryan Hoque, Ashwin Balakrishna, Carl Putterman, Michael Luo, Daniel S. Brown, Daniel Seita, Brijen Thananjeyan, Ellen Novoseller, Ken Goldberg

Corrective interventions while a robot is learning to automate a task provide an intuitive method for a human supervisor to assist the robot and convey information about desired behavior.

Continuous Control Imitation Learning

Dynamically Switching Human Prediction Models for Efficient Planning

no code implementations13 Mar 2021 Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan

As environments involving both robots and humans become increasingly common, so does the need to account for people during planning.

Value Alignment Verification

1 code implementation2 Dec 2020 Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum

In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values.

Autonomous Driving

Exploratory Grasping: Asymptotically Optimal Algorithms for Grasping Challenging Polyhedral Objects

no code implementations11 Nov 2020 Michael Danielczuk, Ashwin Balakrishna, Daniel S. Brown, Shivin Devgon, Ken Goldberg

However, these policies can consistently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps.

Bayesian Robust Optimization for Imitation Learning

1 code implementation NeurIPS 2020 Daniel S. Brown, Scott Niekum, Marek Petrik

Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.

Imitation Learning reinforcement-learning +1

Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences

1 code implementation ICML 2020 Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum

Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100, 000 samples from the posterior over reward functions in only 5 minutes on a personal laptop.

Atari Games Bayesian Inference +1

Deep Bayesian Reward Learning from Preferences

no code implementations10 Dec 2019 Daniel S. Brown, Scott Niekum

Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy.

Atari Games Imitation Learning

Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations

3 code implementations12 Apr 2019 Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum

A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.

Imitation Learning reinforcement-learning +1

Risk-Aware Active Inverse Reinforcement Learning

2 code implementations8 Jan 2019 Daniel S. Brown, Yuchen Cui, Scott Niekum

Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning.

Active Learning reinforcement-learning +1

Machine Teaching for Inverse Reinforcement Learning: Algorithms and Applications

1 code implementation20 May 2018 Daniel S. Brown, Scott Niekum

Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization.

Decision Making reinforcement-learning +1

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

3 code implementations3 Jul 2017 Daniel S. Brown, Scott Niekum

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance.

reinforcement-learning Reinforcement Learning (RL)

Cannot find the paper you are looking for? You can Submit a new open access paper.