no code implementations • 3 Jan 2023 • Daniel Shin, Anca D. Dragan, Daniel S. Brown
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.
no code implementations • 20 Jul 2021 • Daniel Shin, Daniel S. Brown, Anca D. Dragan
Learning a reward function from human preferences is challenging as it typically requires having a high-fidelity simulator or using expensive and potentially unsafe actual physical rollouts in the environment.