Search Results for author: Scott Emmons

Found 19 papers, 10 papers with code

Observation Interference in Partially Observable Assistance Games

no code implementations23 Dec 2024 Scott Emmons, Caspar Oesterheld, Vincent Conitzer, Stuart Russell

We show that this incentive for interference goes away if the human is playing optimally, or if we introduce a communication channel for the human to communicate their preferences to the assistant.

The Partially Observable Off-Switch Game

no code implementations25 Nov 2024 Andrew Garber, Rohan Subramani, Linus Luu, Mark Bedaywi, Stuart Russell, Scott Emmons

Unlike when the human has full observability, we find that in optimal play, even AI agents assisting perfectly rational humans sometimes avoid shutdown.

Failures to Find Transferable Image Jailbreaks Between Vision-Language Models

no code implementations21 Jul 2024 Rylan Schaeffer, Dan Valentine, Luke Bailey, James Chua, Cristóbal Eyzaguirre, Zane Durante, Joe Benton, Brando Miranda, Henry Sleight, John Hughes, Rajashree Agrawal, Mrinank Sharma, Scott Emmons, Sanmi Koyejo, Ethan Perez

These results stand in stark contrast to existing evidence of universal and transferable text jailbreaks against language models and transferable adversarial attacks against image classifiers, suggesting that VLMs may be more robust to gradient-based transfer attacks.

Instruction Following Language Modelling +1

When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback

no code implementations27 Feb 2024 Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human evaluators fully observe the environment.

A StrongREJECT for Empty Jailbreaks

2 code implementations15 Feb 2024 Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

To create a benchmark, researchers must choose a dataset of forbidden prompts to which a victim model will respond, along with an evaluation method that scores the harmfulness of the victim model's responses.

MMLU

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation20 Dec 2023 Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

ALMANACS scores explainability methods on simulatability, i. e., how well the explanations improve behavior prediction on new inputs.

counterfactual Language Modeling +2

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

1 code implementation7 Jul 2022 Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.

An Empirical Investigation of Representation Learning for Imitation

2 code implementations16 May 2022 Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

RvS: What is Essential for Offline RL via Supervised Learning?

1 code implementation20 Dec 2021 Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.

Offline RL

The Essential Elements of Offline RL via Supervised Learning

no code implementations ICLR 2022 Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

These methods, which we collectively refer to as reinforcement learning via supervised learning (RvS), involve a number of design decisions, such as policy architectures and how the conditioning variable is constructed.

Offline RL reinforcement-learning +1

Sparse Graphical Memory for Robust Planning

1 code implementation NeurIPS 2020 Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.

Imitation Learning Visual Navigation

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection

no code implementations24 Oct 2018 Scott Emmons, Peter J. Mucha

In this work, we introduce a tuning parameter to the content map equation that allows users of the Infomap community detection algorithm to control the metadata's relative importance for identifying network structure.

Community Detection

Post-processing partitions to identify domains of modularity optimization

1 code implementation12 Jun 2017 William H. Weir, Scott Emmons, Ryan Gibson, Dane Taylor, Peter J. Mucha

We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics.

Social and Information Networks Physics and Society

Cannot find the paper you are looking for? You can Submit a new open access paper.