no code implementations • 2 May 2024 • Jerry Zhi-Yang He, Sashrika Pandey, Mariah L. Schrum, Anca Dragan
Proper usage of the context enables the LLM to generate personalized responses, whereas inappropriate contextual influence can lead to stereotypical and potentially harmful generations (e. g. associating "female" with "housekeeper").
no code implementations • 16 Oct 2023 • Jerry Zhi-Yang He, Zackory Erickson, Daniel S. Brown, Anca D. Dragan
We propose that capturing robustness in these interactive settings requires constructing and analyzing the entire natural-adversarial frontier: the Pareto-frontier of human policies that are the best trade-offs between naturalness and low robot performance.
no code implementations • 5 Dec 2022 • Jerry Zhi-Yang He, aditi raghunathan, Daniel S. Brown, Zackory Erickson, Anca D. Dragan
We advocate that generalization to such OOD policies benefits from (1) learning a good latent representation for human policies that test-time humans can accurately be mapped to, and (2) making that representation adaptable with test-time interaction data, instead of relying on it to perfectly capture the space of human policies based on the simulated population only.
no code implementations • 13 Apr 2022 • Jeremy Tien, Jerry Zhi-Yang He, Zackory Erickson, Anca D. Dragan, Daniel S. Brown
While much prior work focuses on causal confusion in reinforcement learning and behavioral cloning, we focus on a systematic study of causal confusion and reward misidentification when learning from preferences.
no code implementations • 18 Nov 2021 • Jerry Zhi-Yang He, Anca D. Dragan
We contribute an Assisted Reward Design method that speeds up the design process by anticipating and influencing this future evidence: rather than letting the designer eventually encounter failure cases and revise the reward then, the method actively exposes the designer to such environments during the development phase.