no code implementations • 28 May 2024 • Micah Carroll, Davis Foote, Anand Siththaranjan, Stuart Russell, Anca Dragan
Existing AI alignment approaches assume that preferences are static, which is unrealistic: our preferences change, and may even be influenced by our interactions with AI systems themselves.
1 code implementation • 15 Feb 2024 • Jingqi Li, Anand Siththaranjan, Somayeh Sojoudi, Claire Tomlin, Andrea Bajcsy
Autonomous agents should be able to coordinate with other agents without knowing their intents ahead of time.
1 code implementation • 13 Dec 2023 • Anand Siththaranjan, Cassidy Laidlaw, Dylan Hadfield-Menell
We prove that standard applications of preference learning, including reinforcement learning from human feedback (RLHF), implicitly aggregate over hidden contexts according to a well-known voting rule called Borda count.
no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.
no code implementations • 5 Apr 2022 • Tyler Westenbroek, Anand Siththaranjan, Mohsin Sarwari, Claire J. Tomlin, Shankar S. Sastry
However, despite the extensive impacts of methods such as receding horizon control, dynamic programming and reinforcement learning, the design of cost functions for a particular system often remains a heuristic-driven process of trial and error.
no code implementations • 9 Mar 2021 • Andrea Bajcsy, Anand Siththaranjan, Claire J. Tomlin, Anca D. Dragan
This enables us to leverage tools from reachability analysis and optimal control to compute the set of hypotheses the robot could learn in finite time, as well as the worst and best-case time it takes to learn them.