1 code implementation • 3 Oct 2023 • Albert Garde, Esben Kran, Fazl Barez
By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe.
1 code implementation • 31 May 2023 • Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, Fazl Barez
Conventional methods require examination of examples with strong neuron activation and manual identification of patterns to decipher the concepts a neuron responds to.
1 code implementation • 27 May 2023 • Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez
We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity.
no code implementations • 22 Apr 2023 • Alex Foote, Neel Nanda, Esben Kran, Ionnis Konstas, Fazl Barez
Understanding the function of individual neurons within language models is essential for mechanistic interpretability research.