Search Results for author: Nicholas Goldowsky-Dill

Found 1 papers, 1 papers with code

Localizing Model Behavior with Path Patching

1 code implementation12 Apr 2023 Nicholas Goldowsky-Dill, Chris MacLeod, Lucas Sato, Aryaman Arora

Localizing behaviors of neural networks to a subset of the network's components or a subset of interactions between components is a natural first step towards analyzing network mechanisms and possible failure modes.

Cannot find the paper you are looking for? You can Submit a new open access paper.