2 code implementations • ICLR 2022 • Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel
Our intuition is that disagreement in learned reward model reflects uncertainty in tailored human feedback and could be useful for exploration.
reinforcement-learning Reinforcement Learning (RL) +1