no code implementations • 27 Feb 2024 • Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons
Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.
no code implementations • 19 Feb 2024 • Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons
Do language models implicitly learn a concept of human wellbeing?
1 code implementation • 15 Feb 2024 • Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer
We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks.
1 code implementation • 20 Dec 2023 • Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons
The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.
1 code implementation • 1 Sep 2023 • Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons
In this work, we focus on the image input to a vision-language model (VLM).
1 code implementation • 6 Apr 2023 • Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, HANLIN ZHANG, Scott Emmons, Dan Hendrycks
And how do we measure these behaviors in general-purpose models such as GPT-4?
2 code implementations • 22 Nov 2022 • Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell
imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.
1 code implementation • 7 Jul 2022 • Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell
In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.
2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah
We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.
1 code implementation • 20 Dec 2021 • Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
no code implementations • ICLR 2022 • Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine
These methods, which we collectively refer to as reinforcement learning via supervised learning (RvS), involve a number of design decisions, such as policy architectures and how the conditioning variable is constructed.
1 code implementation • NeurIPS 2020 • Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak
To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.
no code implementations • 24 Oct 2018 • Scott Emmons, Peter J. Mucha
In this work, we introduce a tuning parameter to the content map equation that allows users of the Infomap community detection algorithm to control the metadata's relative importance for identifying network structure.
1 code implementation • 12 Jun 2017 • William H. Weir, Scott Emmons, Ryan Gibson, Dane Taylor, Peter J. Mucha
We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics.
Social and Information Networks Physics and Society