Search Results for author: Rachel Freedman

Found 9 papers, 1 papers with code

Adapting a Kidney Exchange Algorithm to Align with Human Values

1 code implementation • 19 May 2020 • Rachel Freedman, Jana Schaich Borg, Walter Sinnott-Armstrong, John P. Dickerson, Vincent Conitzer

In kidney exchanges, a central market maker allocates living kidney donors to patients in need of an organ.

Paper
Code

Aligning with Heterogeneous Preferences for Kidney Exchange

no code implementations • 16 Jun 2020 • Rachel Freedman

In this paper, we propose, implement and evaluate a methodology for prioritizing patients based on such heterogeneous moral preferences.

Decision Making

Paper
Add Code

Benefits of Assistance over Reward Learning

no code implementations • 1 Jan 2021 • Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

By merging reward learning and control, assistive agents can reason about the impact of control actions on reward learning, leading to several advantages over agents based on reward learning.

Paper
Add Code

Choice Set Misspecification in Reward Inference

no code implementations • 19 Jan 2021 • Rachel Freedman, Rohin Shah, Anca Dragan

A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback, like demonstrations or corrections.

Paper
Add Code

The Expertise Problem: Learning from Specialized Feedback

no code implementations • 12 Nov 2022 • Oliver Daniels-Koch, Rachel Freedman

RLHF algorithms that learn from multiple teachers therefore face an expertise problem: the reliability of a given piece of feedback depends both on the teacher that it comes from and how specialized that teacher is on relevant components of the task.

Paper
Add Code

Active Reward Learning from Multiple Teachers

no code implementations • 2 Mar 2023 • Peter Barnett, Rachel Freedman, Justin Svegliato, Stuart Russell

Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system.

Paper
Add Code

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

no code implementations • 27 Jul 2023 • Stephen Casper, Xander Davies, Claudia Shi, Thomas Krendl Gilbert, Jérémy Scheurer, Javier Rando, Rachel Freedman, Tomasz Korbak, David Lindner, Pedro Freire, Tony Wang, Samuel Marks, Charbel-Raphaël Segerie, Micah Carroll, Andi Peng, Phillip Christoffersen, Mehul Damani, Stewart Slocum, Usman Anwar, Anand Siththaranjan, Max Nadeau, Eric J. Michaud, Jacob Pfau, Dmitrii Krasheninnikov, Xin Chen, Lauro Langosco, Peter Hase, Erdem Biyik, Anca Dragan, David Krueger, Dorsa Sadigh, Dylan Hadfield-Menell

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals.

reinforcement-learning

Paper
Add Code

Active teacher selection for reinforcement learning from human feedback

no code implementations • 23 Oct 2023 • Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell

The HUB framework and ATS algorithm demonstrate the importance of leveraging differences between teachers to learn accurate reward models, facilitating future research on active teacher selection for robust reward modeling.

Recommendation Systems reinforcement-learning

Paper
Add Code

Social Choice for AI Alignment: Dealing with Diverse Human Feedback

no code implementations • 16 Apr 2024 • Vincent Conitzer, Rachel Freedman, Jobst Heitzig, Wesley H. Holliday, Bob M. Jacobs, Nathan Lambert, Milan Mossé, Eric Pacuit, Stuart Russell, Hailey Schoelkopf, Emanuel Tewolde, William S. Zwicker

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, so that, for example, they refuse to comply with requests for help with committing crimes or with producing racist text.

Ethics

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.