Search Results for author: Edmund Mills

Found 2 papers, 1 papers with code

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation • 20 Dec 2023 • Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.

Language Modelling

Paper
Code

Retrospective on the 2021 BASALT Competition on Learning from Human Feedback

no code implementations • 14 Apr 2022 • Rohin Shah, Steven H. Wang, Cody Wild, Stephanie Milani, Anssi Kanervisto, Vinicius G. Goecks, Nicholas Waytowich, David Watkins-Valls, Bharat Prakash, Edmund Mills, Divyansh Garg, Alexander Fries, Alexandra Souly, Chan Jun Shern, Daniel del Castillo, Tom Lieberum

The goal of the competition was to promote research towards agents that use learning from human feedback (LfHF) techniques to solve open-world tasks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.