Search Results for author: Madeline Brumley

Found 1 papers, 1 papers with code

Eliciting Latent Knowledge from Quirky Language Models

1 code implementation2 Dec 2023 Alex Mallen, Madeline Brumley, Julia Kharchenko, Nora Belrose

Eliciting Latent Knowledge (ELK) aims to find patterns in a capable neural network's activations that robustly track the true state of the world, especially in hard-to-verify cases where the model's output is untrusted.

Anomaly Detection Math

Cannot find the paper you are looking for? You can Submit a new open access paper.