Search Results for author: Esben Kran

Found 4 papers, 3 papers with code

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

1 code implementation • 3 Oct 2023 • Albert Garde, Esben Kran, Fazl Barez

By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe.

Paper
Code

Neuron to Graph: Interpreting Language Model Neurons at Scale

1 code implementation • 31 May 2023 • Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, Fazl Barez

Conventional methods require examination of examples with strong neuron activation and manual identification of patterns to decipher the concepts a neuron responds to.

Language Modelling

Paper
Code

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

1 code implementation • 27 May 2023 • Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez

We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity.

Model Editing Specificity

Paper
Code

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

no code implementations • 22 Apr 2023 • Alex Foote, Neel Nanda, Esben Kran, Ionnis Konstas, Fazl Barez

Understanding the function of individual neurons within language models is essential for mechanistic interpretability research.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.