Search Results for author: Curt Tigges

Found 6 papers, 1 papers with code

SAEBench: A Comprehensive Benchmark for Sparse Autoencoders in Language Model Interpretability

no code implementations12 Mar 2025 Adam Karvonen, Can Rager, Johnny Lin, Curt Tigges, Joseph Bloom, David Chanin, Yeu-Tong Lau, Eoin Farrell, Callum McDougall, Kola Ayonrinde, Matthew Wearden, Arthur Conmy, Samuel Marks, Neel Nanda

We introduce SAEBench, a comprehensive evaluation suite that measures SAE performance across seven diverse metrics, spanning interpretability, feature disentanglement and practical applications like unlearning.

Disentanglement Language Modeling +1

Sparse Autoencoders Do Not Find Canonical Units of Analysis

no code implementations7 Feb 2025 Patrick Leask, Bart Bussmann, Michael Pearce, Joseph Bloom, Curt Tigges, Noura Al Moubayed, Lee Sharkey, Neel Nanda

Using meta-SAEs -- SAEs trained on the decoder matrix of another SAE -- we find that latents in SAEs often decompose into combinations of latents from a smaller SAE, showing that larger SAE latents are not atomic.

LLM Circuit Analyses Are Consistent Across Training and Scale

no code implementations15 Jul 2024 Curt Tigges, Michael Hanna, Qinan Yu, Stella Biderman

These results suggest that circuit analyses conducted on small models at the end of pre-training can provide insights that still apply after additional pre-training and over model scale.

Decoder

Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion

no code implementations23 Jan 2024 Dylan Zhang, Curt Tigges, Zory Zhang, Stella Biderman, Maxim Raginsky, Talia Ringer

The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture.

Linear Representations of Sentiment in Large Language Models

1 code implementation23 Oct 2023 Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda

Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs).

Zero-Shot Learning

Can Transformers Learn to Solve Problems Recursively?

no code implementations24 May 2023 Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, Talia Ringer

Neural networks have in recent years shown promise for helping software engineers write programs and even formally verify them.

Cannot find the paper you are looking for? You can Submit a new open access paper.