Search Results for author: Esben Kran

Found 4 papers, 3 papers with code

DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

1 code implementation3 Oct 2023 Albert Garde, Esben Kran, Fazl Barez

By granting access to state-of-the-art interpretability methods, DeepDecipher makes LLMs more transparent, trustworthy, and safe.

Neuron to Graph: Interpreting Language Model Neurons at Scale

1 code implementation31 May 2023 Alex Foote, Neel Nanda, Esben Kran, Ioannis Konstas, Shay Cohen, Fazl Barez

Conventional methods require examination of examples with strong neuron activation and manual identification of patterns to decipher the concepts a neuron responds to.

Language Modelling

Detecting Edit Failures In Large Language Models: An Improved Specificity Benchmark

1 code implementation27 May 2023 Jason Hoelscher-Obermaier, Julia Persson, Esben Kran, Ioannis Konstas, Fazl Barez

We use this improved benchmark to evaluate recent model editing techniques and find that they suffer from low specificity.

Model Editing Specificity

N2G: A Scalable Approach for Quantifying Interpretable Neuron Representations in Large Language Models

no code implementations22 Apr 2023 Alex Foote, Neel Nanda, Esben Kran, Ionnis Konstas, Fazl Barez

Understanding the function of individual neurons within language models is essential for mechanistic interpretability research.

Cannot find the paper you are looking for? You can Submit a new open access paper.