no code implementations • 16 Mar 2024 • Anthony Liang, Jesse Thomason, Erdem Biyik
Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent on diverse tasks including DeepMind Control benchmark, robot manipulation in simulation and on a real robot.
no code implementations • 9 Mar 2024 • Evan Ellis, Gaurav R. Ghosal, Stuart J. Russell, Anca Dragan, Erdem Biyik
Preference-based reward learning is a popular technique for teaching robots and autonomous systems how a human user wants them to perform a task.
no code implementations • 25 Feb 2024 • Anthony Liang, Guy Tennenholtz, Chih-Wei Hsu, Yinlam Chow, Erdem Biyik, Craig Boutilier
We introduce DynaMITE-RL, a meta-reinforcement learning (meta-RL) approach to approximate inference in environments where the latent state evolves at varying rates.
no code implementations • 24 Feb 2024 • Erdem Biyik, Nima Anari, Dorsa Sadigh
Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time.
no code implementations • 6 Feb 2024 • YuFei Wang, Zhanyi Sun, Jesse Zhang, Zhou Xian, Erdem Biyik, David Held, Zackory Erickson
Reward engineering has long been a challenge in Reinforcement Learning (RL) research, as it often requires extensive human effort and iterative processes of trial-and-error to design effective reward functions.
no code implementations • 22 Oct 2023 • Erdem Biyik, Fan Yao, Yinlam Chow, Alex Haig, Chih-Wei Hsu, Mohammad Ghavamzadeh, Craig Boutilier
Leveraging concept activation vectors for soft attribute semantics, we develop novel preference elicitation methods that can accommodate soft attributes and bring together both item and attribute-based preference elicitation.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
no code implementations • 27 Feb 2023 • Vivek Myers, Erdem Biyik, Dorsa Sadigh
Robot policies need to adapt to human preferences and/or new environments.
1 code implementation • 25 Nov 2022 • Megha Srivastava, Erdem Biyik, Suvir Mirchandani, Noah Goodman, Dorsa Sadigh
In this paper, we focus on the problem of assistive teaching of motor control tasks such as parking a car or landing an aircraft.
1 code implementation • 19 Oct 2022 • Erdem Biyik
To this end, we first propose various forms of comparative feedback, e. g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function, which may be parametric or non-parametric.
no code implementations • 8 Mar 2022 • Zhangjie Cao, Erdem Biyik, Guy Rosman, Dorsa Sadigh
At a certain time, to forecast a reasonable future trajectory, each agent needs to pay attention to the interactions with only a small group of most relevant agents instead of unnecessarily paying attention to all the other agents.
no code implementations • 2 Oct 2021 • Erdem Biyik, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh
Our results show that the proposed partner-aware strategy outperforms other known methods, and our human subject studies suggest humans prefer to collaborate with AI agents implementing our partner-aware strategy.
1 code implementation • 1 Oct 2021 • Nils Wilde, Erdem Biyik, Dorsa Sadigh, Stephen L. Smith
Today's robots are increasingly interacting with people and need to efficiently learn inexperienced user's preferences.
no code implementations • 27 Sep 2021 • Vivek Myers, Erdem Biyik, Nima Anari, Dorsa Sadigh
However, expert feedback is often assumed to be drawn from an underlying unimodal reward function.
1 code implementation • 16 Aug 2021 • Erdem Biyik, Aditi Talati, Dorsa Sadigh
Reward learning is a fundamental problem in human-robot interaction to have robots that operate in alignment with what their human user wants.
no code implementations • 13 May 2021 • Woodrow Z. Wang, Mark Beliaev, Erdem Biyik, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game.
no code implementations • 6 May 2021 • Erdem Biyik, Daniel A. Lazar, Ramtin Pedarsani, Dorsa Sadigh
Traffic congestion has large economic and social costs.
no code implementations • 28 Dec 2020 • Mark Beliaev, Erdem Biyik, Daniel A. Lazar, Woodrow Z. Wang, Dorsa Sadigh, Ramtin Pedarsani
In turn, significant increases in traffic congestion are expected, since people are likely to prefer using their own vehicles or taxis as opposed to riskier and more crowded options such as the railway.
1 code implementation • 9 Nov 2020 • Kejun Li, Maegan Tucker, Erdem Biyik, Ellen Novoseller, Joel W. Burdick, Yanan Sui, Dorsa Sadigh, Yisong Yue, Aaron D. Ames
ROIAL learns Bayesian posteriors that predict each exoskeleton user's utility landscape across four exoskeleton gait parameters.
no code implementations • 10 Aug 2020 • Zheqing Zhu, Erdem Biyik, Dorsa Sadigh
Multi-agent safe systems have become an increasingly important area of study as we can now easily have multiple AI-powered systems operating together.
1 code implementation • 1 Jul 2020 • Zhangjie Cao, Erdem Biyik, Woodrow Z. Wang, Allan Raventos, Adrien Gaidon, Guy Rosman, Dorsa Sadigh
To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes.
no code implementations • 24 Jun 2020 • Erdem Biyik, Dylan P. Losey, Malayandi Palan, Nicholas C. Landolfi, Gleb Shevchuk, Dorsa Sadigh
As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers.
1 code implementation • 6 May 2020 • Erdem Biyik, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh
Our results in simulations and a user study suggest that our approach can efficiently learn expressive reward functions for robotics tasks.
no code implementations • 13 Jan 2020 • Minae Kwon, Erdem Biyik, Aditi Talati, Karan Bhasin, Dylan P. Losey, Dorsa Sadigh
Overall, we extend existing rational human models so that collaborative robots can anticipate and plan around suboptimal human behavior during HRI.
2 code implementations • 10 Oct 2019 • Erdem Biyik, Malayandi Palan, Nicholas C. Landolfi, Dylan P. Losey, Dorsa Sadigh
Robots can learn the right reward function by querying a human expert.
1 code implementation • 19 Jun 2019 • Erdem Biyik, Kenneth Wang, Nima Anari, Dorsa Sadigh
While active learning methods attempt to tackle this issue by labeling only the data samples that give high information, they generally suffer from large computational costs and are impractical in settings where data can be collected in parallel.
no code implementations • 1 Apr 2019 • Erdem Biyik, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh
We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models.
1 code implementation • 10 Oct 2018 • Erdem Biyik, Dorsa Sadigh
Data generation and labeling are usually an expensive part of learning for robotics.