Search Results for author: Scott Emmons

Found 14 papers, 10 papers with code

When Your AIs Deceive You: Challenges with Partial Observability of Human Evaluators in Reward Learning

no code implementations27 Feb 2024 Leon Lang, Davis Foote, Stuart Russell, Anca Dragan, Erik Jenner, Scott Emmons

Past analyses of reinforcement learning from human feedback (RLHF) assume that the human fully observes the environment.

A StrongREJECT for Empty Jailbreaks

1 code implementation15 Feb 2024 Alexandra Souly, Qingyuan Lu, Dillon Bowen, Tu Trinh, Elvis Hsieh, Sana Pandey, Pieter Abbeel, Justin Svegliato, Scott Emmons, Olivia Watkins, Sam Toyer

We show that our new grading scheme better accords with human judgment of response quality and overall jailbreak effectiveness, especially on the sort of low-quality responses that contribute the most to over-estimation of jailbreak performance on existing benchmarks.

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation20 Dec 2023 Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.

Language Modelling

For Learning in Symmetric Teams, Local Optima are Global Nash Equilibria

1 code implementation7 Jul 2022 Scott Emmons, Caspar Oesterheld, Andrew Critch, Vincent Conitzer, Stuart Russell

In this work, we show that any locally optimal symmetric strategy profile is also a (global) Nash equilibrium.

An Empirical Investigation of Representation Learning for Imitation

2 code implementations16 May 2022 Xin Chen, Sam Toyer, Cody Wild, Scott Emmons, Ian Fischer, Kuang-Huei Lee, Neel Alex, Steven H Wang, Ping Luo, Stuart Russell, Pieter Abbeel, Rohin Shah

We propose a modular framework for constructing representation learning algorithms, then use our framework to evaluate the utility of representation learning for imitation across several environment suites.

Image Classification Imitation Learning +1

RvS: What is Essential for Offline RL via Supervised Learning?

1 code implementation20 Dec 2021 Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.

Offline RL

The Essential Elements of Offline RL via Supervised Learning

no code implementations ICLR 2022 Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine

These methods, which we collectively refer to as reinforcement learning via supervised learning (RvS), involve a number of design decisions, such as policy architectures and how the conditioning variable is constructed.

Offline RL reinforcement-learning +1

Sparse Graphical Memory for Robust Planning

1 code implementation NeurIPS 2020 Scott Emmons, Ajay Jain, Michael Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

To operate effectively in the real world, agents should be able to act from high-dimensional raw sensory input such as images and achieve diverse goals across long time-horizons.

Imitation Learning Visual Navigation

A Map Equation with Metadata: Varying the Role of Attributes in Community Detection

no code implementations24 Oct 2018 Scott Emmons, Peter J. Mucha

In this work, we introduce a tuning parameter to the content map equation that allows users of the Infomap community detection algorithm to control the metadata's relative importance for identifying network structure.

Community Detection

Post-processing partitions to identify domains of modularity optimization

1 code implementation12 Jun 2017 William H. Weir, Scott Emmons, Ryan Gibson, Dane Taylor, Peter J. Mucha

We introduce the Convex Hull of Admissible Modularity Partitions (CHAMP) algorithm to prune and prioritize different network community structures identified across multiple runs of possibly various computational heuristics.

Social and Information Networks Physics and Society

Cannot find the paper you are looking for? You can Submit a new open access paper.