no code implementations • 16 Jun 2023 • Xinran Liang, Anthony Han, Wilson Yan, aditi raghunathan, Pieter Abbeel
In addition, we show that by training on actively collected data more relevant to the environment and task, our method generalizes more robustly to downstream tasks compared to models pre-trained on fixed datasets such as ImageNet.
2 code implementations • ICLR 2022 • Xinran Liang, Katherine Shu, Kimin Lee, Pieter Abbeel
Our intuition is that disagreement in learned reward model reflects uncertainty in tailored human feedback and could be useful for exploration.