no code implementations • 11 Feb 2024 • Bilal Chughtai, Alan Cooney, Neel Nanda
How do transformer-based large language models (LLMs) store and retrieve knowledge?
1 code implementation • 6 Feb 2023 • Bilal Chughtai, Lawrence Chan, Neel Nanda
Universality is a key hypothesis in mechanistic interpretability -- that different models learn similar features and circuits when trained on similar tasks.