Search Results for author: Atticus Geiger

Found 23 papers, 15 papers with code

ReFT: Representation Finetuning for Language Models

2 code implementations4 Apr 2024 Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

3 code implementations12 Mar 2024 Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.

Model Editing

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

1 code implementation23 Jan 2024 Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".

Linear Representations of Sentiment in Large Language Models

1 code implementation23 Oct 2023 Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda

Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs).

Zero-Shot Learning

Rigorously Assessing Natural Language Explanations of Neurons

no code implementations19 Sep 2023 Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging.

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

1 code implementation30 May 2023 Jingyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger

For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful, including those using step-by-step reasoning.

Benchmarking In-Context Learning +3

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

1 code implementation NeurIPS 2023 Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman

With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables.

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

no code implementations5 Mar 2023 Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations.

Explainable artificial intelligence

Causal Abstraction for Faithful Model Interpretation

no code implementations11 Jan 2023 Atticus Geiger, Chris Potts, Thomas Icard

A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model.

Explainable Artificial Intelligence (XAI)

Causal Abstraction with Soft Interventions

no code implementations22 Nov 2022 Riccardo Massidda, Atticus Geiger, Thomas Icard, Davide Bacciu

Causal abstraction provides a theory describing how several causal models can represent the same system at different levels of detail.

Inducing Causal Structure for Interpretable Neural Networks

2 code implementations1 Dec 2021 Atticus Geiger, Zhengxuan Wu, Hanson Lu, Josh Rozner, Elisa Kreiss, Thomas Icard, Noah D. Goodman, Christopher Potts

In IIT, we (1) align variables in a causal model (e. g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input.

counterfactual Data Augmentation +1

Causal Abstractions of Neural Networks

1 code implementation NeurIPS 2021 Atticus Geiger, Hanson Lu, Thomas Icard, Christopher Potts

Structural analysis methods (e. g., probing and feature attribution) are increasingly important tools for neural network analysis.

Natural Language Inference

DynaSent: A Dynamic Benchmark for Sentiment Analysis

1 code implementation ACL 2021 Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela

We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.

Sentiment Analysis

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

1 code implementation EMNLP (BlackboxNLP) 2020 Atticus Geiger, Kyle Richardson, Christopher Potts

We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions.

Lexical Entailment Natural Language Inference +2

Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences

no code implementations30 Oct 2018 Atticus Geiger, Ignacio Cases, Lauri Karttunen, Christopher Potts

Standard evaluations of deep learning models for semantics using naturalistic corpora are limited in what they can tell us about the fidelity of the learned representations, because the corpora rarely come with good measures of semantic complexity.

Natural Language Inference

Cannot find the paper you are looking for? You can Submit a new open access paper.