no code implementations • 10 Apr 2024 • Zohre Karimi, Shing-Hei Ho, Bao Thach, Alan Kuntz, Daniel S. Brown
This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations.
no code implementations • 25 Oct 2023 • Connor Mattson, Jeremy C. Clark, Daniel S. Brown
We study the problem of determining the emergent behaviors that are possible given a functionally heterogeneous swarm of robots with limited capabilities.
no code implementations • 16 Oct 2023 • Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan
We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance.
no code implementations • 19 Jul 2023 • Gaurav Ghosal, Amrith Setlur, Daniel S. Brown, Anca D. Dragan, aditi raghunathan
We formalize a new setting called contextual reliability which accounts for the fact that the "right" features to use may vary depending on the context.
no code implementations • 22 Jun 2023 • Akansha Kalra, Daniel S. Brown
There is an increasing interest in learning reward functions that model human preferences.
no code implementations • 25 Apr 2023 • Connor Mattson, Daniel S. Brown
We combine our learned similarity metric with novelty search and clustering to explore and categorize the space of possible swarm behaviors.
no code implementations • 11 Jan 2023 • Yi Liu, Gaurav Datta, Ellen Novoseller, Daniel S. Brown
In particular, we provide evidence that a learned dynamics model offers the following benefits when performing PbRL: (1) preference elicitation and policy optimization require significantly fewer environment interactions than model-free PbRL, (2) diverse preference queries can be synthesized safely and efficiently as a byproduct of standard model-based RL, and (3) reward pre-training based on suboptimal demonstrations can be performed without any environmental interaction.
no code implementations • 3 Jan 2023 • Daniel Shin, Anca D. Dragan, Daniel S. Brown
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.
no code implementations • 2 Jan 2023 • Andreea Bobu, Yi Liu, Rohin Shah, Daniel S. Brown, Anca D. Dragan
This, in turn, is what enables the robot to disambiguate between what needs to go into the representation versus what is spurious, as well as what aspects of behavior can be compressed together versus not.
no code implementations • 5 Dec 2022 • Jerry Zhi-Yang He, aditi raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan
We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only.
no code implementations • 28 Nov 2022 • Tu Trinh, Haoyu Chen, Daniel S. Brown
We evaluate our approach in simulation for both discrete and continuous state-space domains and illustrate the feasibility of developing a robotic system that can accurately evaluate demonstration sufficiency.
no code implementations • 14 Oct 2022 • Albert Wilcox, Ashwin Balakrishna, Jules Dedieu, Wyame Benslimane, Daniel S. Brown, Ken Goldberg
Providing densely shaped reward functions for RL algorithms is often exceedingly challenging, motivating the development of RL algorithms that can learn from easier-to-specify sparse reward functions.
no code implementations • 23 Aug 2022 • Gaurav R. Ghosal, Matthew Zurek, Daniel S. Brown, Anca D. Dragan
In this work, we advocate that grounding the rationality coefficient in real data for each feedback type, rather than assuming a default value, has a significant positive effect on reward learning.
no code implementations • 13 Apr 2022 • Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown
While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences.
no code implementations • 4 Mar 2022 • Arjun Sripathy, Andreea Bobu, Zhongyu Li, Koushil Sreenath, Daniel S. Brown, Anca D. Dragan
As a result 1) all user feedback can contribute to learning about every emotion; 2) the robot can generate trajectories for any emotion in the space instead of only a few predefined ones; and 3) the robot can respond emotively to user-generated natural language by mapping it to a target VAD.
no code implementations • 17 Sep 2021 • Ryan Hoque, Ashwin Balakrishna, Ellen Novoseller, Albert Wilcox, Daniel S. Brown, Ken Goldberg
Effective robot learning often requires online human feedback and interventions that can cost significant human time, giving rise to the central challenge in interactive imitation learning: is it possible to control the timing and length of interventions to both facilitate learning and limit burden on the human supervisor?
no code implementations • 20 Jul 2021 • Daniel Shin, Daniel S. Brown, Anca D. Dragan
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.
1 code implementation • 13 Jul 2021 • Shivin Devgon, Jeffrey Ichnowski, Michael Danielczuk, Daniel S. Brown, Ashwin Balakrishna, Shirin Joshi, Eduardo M. C. Rocha, Eugen Solowjow, Ken Goldberg
In industrial part kitting, 3D objects are inserted into cavities for transportation or subsequent assembly.
no code implementations • 11 Jun 2021 • Zaynah Javed, Daniel S. Brown, Satvik Sharma, Jerry Zhu, Ashwin Balakrishna, Marek Petrik, Anca D. Dragan, Ken Goldberg
Results suggest that PG-BROIL can produce a family of behaviors ranging from risk-neutral to risk-averse and outperforms state-of-the-art imitation learning algorithms when learning from ambiguous demonstrations by hedging against uncertainty, rather than seeking to uniquely identify the demonstrator's reward function.
1 code implementation • 23 Apr 2021 • Avik Jain, Lawrence Chan, Daniel S. Brown, Anca D. Dragan
We test our approach in an autonomous driving domain where we find costs different from the ground truth that implicitly compensate for replanning, short horizon, incorrect dynamics models, and local minima issues.
no code implementations • 14 Apr 2021 • Matthew Zurek, Andreea Bobu, Daniel S. Brown, Anca D. Dragan
Shared autonomy enables robots to infer user intent and assist in accomplishing it.
no code implementations • 31 Mar 2021 • Ryan Hoque, Ashwin Balakrishna, Carl Putterman, Michael Luo, Daniel S. Brown, Daniel Seita, Brijen Thananjeyan, Ellen Novoseller, Ken Goldberg
Corrective interventions while a robot is learning to automate a task provide an intuitive method for a human supervisor to assist the robot and convey information about desired behavior.
no code implementations • 13 Mar 2021 • Arjun Sripathy, Andreea Bobu, Daniel S. Brown, Anca D. Dragan
As environments involving both robots and humans become increasingly common, so does the need to account for people during planning.
1 code implementation • 2 Dec 2020 • Daniel S. Brown, Jordan Schneider, Anca D. Dragan, Scott Niekum
In this paper we formalize and theoretically analyze the problem of efficient value alignment verification: how to efficiently test whether the behavior of another agent is aligned with a human's values.
no code implementations • 11 Nov 2020 • Michael Danielczuk, Ashwin Balakrishna, Daniel S. Brown, Shivin Devgon, Ken Goldberg
However, these policies can consistently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps.
1 code implementation • NeurIPS 2020 • Daniel S. Brown, Scott Niekum, Marek Petrik
Existing safe imitation learning approaches based on IRL deal with this uncertainty using a maxmin framework that optimizes a policy under the assumption of an adversarial reward function, whereas risk-neutral IRL approaches either optimize a policy for the mean or MAP reward function.
1 code implementation • ICML 2020 • Daniel S. Brown, Russell Coleman, Ravi Srinivasan, Scott Niekum
Bayesian REX can learn to play Atari games from demonstrations, without access to the game score and can generate 100, 000 samples from the posterior over reward functions in only 5 minutes on a personal laptop.
no code implementations • 10 Dec 2019 • Daniel S. Brown, Scott Niekum
Bayesian inverse reinforcement learning (IRL) methods are ideal for safe imitation learning, as they allow a learning agent to reason about reward uncertainty and the safety of a learned policy.
2 code implementations • 9 Jul 2019 • Daniel S. Brown, Wonjoon Goo, Scott Niekum
The performance of imitation learning is typically upper-bounded by the performance of the demonstrator.
3 code implementations • 12 Apr 2019 • Daniel S. Brown, Wonjoon Goo, Prabhat Nagarajan, Scott Niekum
A critical flaw of existing inverse reinforcement learning (IRL) methods is their inability to significantly outperform the demonstrator.
2 code implementations • 8 Jan 2019 • Daniel S. Brown, Yuchen Cui, Scott Niekum
Active learning from demonstration allows a robot to query a human for specific types of input to achieve efficient learning.
1 code implementation • 20 May 2018 • Daniel S. Brown, Scott Niekum
Inverse reinforcement learning (IRL) infers a reward function from demonstrations, allowing for policy improvement and generalization.
3 code implementations • 3 Jul 2017 • Daniel S. Brown, Scott Niekum
In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance.