Search Results for author: Scott Emmons

Found 14 papers, 10 papers with code

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

no code implementations • 27 Feb 2024 • Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.

Paper
Add Code

Uncovering Latent Human Wellbeing in Language Model Embeddings

no code implementations • 19 Feb 2024 • Pedro Freire, ChengCheng Tan, Adam Gleave, Dan Hendrycks, Scott Emmons

Do language models implicitly learn a concept of human wellbeing?

Ethics Language Modelling +1

Paper
Add Code

A StrongREJECT for Empty Jailbreaks

1 code implementation • 15 Feb 2024 • Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks.

Paper
Code

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation • 20 Dec 2023 • Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.

Language Modelling

Paper
Code

Image Hijacks: Adversarial Images can Control Generative Models at Runtime

1 code implementation • 1 Sep 2023 • Luke Bailey, Euan Ong, Stuart Russell, Scott Emmons

In this work, we focus on the image input to a vision-language model (VLM).

Language Modelling

Paper
Code

Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark

1 code implementation • 6 Apr 2023 • Alexander Pan, Jun Shern Chan, Andy Zou, Nathaniel Li, Steven Basart, Thomas Woodside, Jonathan Ng, HANLIN ZHANG, Scott Emmons, Dan Hendrycks

And how do we measure these behaviors in general-purpose models such as GPT-4?

Decision Making Ethics

100

Paper
Code

imitation: Clean Imitation Learning Implementations

2 code implementations • 22 Nov 2022 • Adam Gleave, Mohammad Taufeeque, Juan Rocamonde, Erik Jenner, Steven H. Wang, Sam Toyer, Maximilian Ernestus, Nora Belrose, Scott Emmons, Stuart Russell

imitation provides open-source implementations of imitation and reward learning algorithms in PyTorch.

Imitation Learning reinforcement-learning +1

1,136

Paper
Code

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

1 code implementation • 7 Jul 2022 • Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.

Paper
Code

An Empirical Investigation of Representation Learning for Imitation

2 code implementations • 16 May 2022 • Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

Paper
Code

RvS: What is Essential for Offline RL via Supervised Learning?

1 code implementation • 20 Dec 2021 • Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.

Offline RL

Paper
Code

The Essential Elements of Offline RL via Supervised Learning

no code implementations • ICLR 2022 • Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

These methods, which we collectively refer to as reinforcement learning via supervised learning (RvS), involve a number of design decisions, such as policy architectures and how the conditioning variable is constructed.

Offline RL reinforcement-learning +1

Paper
Add Code

Sparse Graphical Memory for Robust Planning

1 code implementation • NeurIPS 2020 • Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.

Imitation Learning Visual Navigation

Paper
Code

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection

no code implementations • 24 Oct 2018 • Scott Emmons, Peter J. Mucha

In this work, we introduce a tuning parameter to the content map equation that allows users of the Infomap community detection algorithm to control the metadata's relative importance for identifying network structure.

Community Detection

Paper
Add Code

Post-processing partitions to identify domains of modularity optimization

1 code implementation • 12 Jun 2017 • William H. Weir, Scott Emmons, Ryan Gibson, Dane Taylor, Peter J. Mucha

We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics.

Social and Information Networks Physics and Society

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.