Search Results for author: Xander Davies

Found 5 papers, 3 papers with code

Circuit Breaking: Removing Model Behaviors with Targeted Ablation

1 code implementation12 Sep 2023 Maximilian Li, Xander Davies, Max Nadeau

Language models often exhibit behaviors that improve performance on a pre-training objective but harm performance on downstream tasks.

Text Generation

Discovering Variable Binding Circuitry with Desiderata

no code implementations7 Jul 2023 Xander Davies, Max Nadeau, Nikhil Prakash, Tamar Rott Shaham, David Bau

Recent work has shown that computation in language models may be human-understandable, with successful efforts to localize and intervene on both single-unit features and input-output circuits.

Sparse Distributed Memory is a Continual Learner

1 code implementation20 Mar 2023 Trenton Bricken, Xander Davies, Deepak Singh, Dmitry Krotov, Gabriel Kreiman

Continual learning is a problem for artificial neural networks that their biological counterparts are adept at solving.

Continual Learning

Unifying Grokking and Double Descent

1 code implementation10 Mar 2023 Xander Davies, Lauro Langosco, David Krueger

A principled understanding of generalization in deep learning may require unifying disparate observations under a single conceptual framework.

Cannot find the paper you are looking for? You can Submit a new open access paper.