Search Results for author: Edmund Mills

Found 2 papers, 1 papers with code

ALMANACS: A Simulatability Benchmark for Language Model Explainability

1 code implementation20 Dec 2023 Edmund Mills, Shiye Su, Stuart Russell, Scott Emmons

The ALMANACS scenarios span twelve safety-relevant topics such as ethical reasoning and advanced AI behaviors; they have idiosyncratic premises to invoke model-specific behavior; and they have a train-test distributional shift to encourage faithful explanations.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.