Search Results for author: Atticus Geiger

Found 23 papers, 15 papers with code

ReFT: Representation Finetuning for Language Models

2 code implementations • 4 Apr 2024 • Zhengxuan Wu, Aryaman Arora, Zheng Wang, Atticus Geiger, Dan Jurafsky, Christopher D. Manning, Christopher Potts

LoReFT is a drop-in replacement for existing PEFTs and learns interventions that are 10x-50x more parameter-efficient than prior state-of-the-art PEFTs.

Arithmetic Reasoning

603

Paper
Code

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

3 code implementations • 12 Mar 2024 • Zhengxuan Wu, Atticus Geiger, Aryaman Arora, Jing Huang, Zheng Wang, Noah D. Goodman, Christopher D. Manning, Christopher Potts

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability.

Model Editing

603

Paper
Code

RAVEL: Evaluating Interpretability Methods on Disentangling Language Model Representations

1 code implementation • 27 Feb 2024 • Jing Huang, Zhengxuan Wu, Christopher Potts, Mor Geva, Atticus Geiger

Individual neurons participate in the representation of multiple high-level concepts.

Attribute Language Modelling

Paper
Code

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

1 code implementation • 23 Jan 2024 • Zhengxuan Wu, Atticus Geiger, Jing Huang, Aryaman Arora, Thomas Icard, Christopher Potts, Noah D. Goodman

We respond to the recent paper by Makelov et al. (2023), which reviews subspace interchange intervention methods like distributed alignment search (DAS; Geiger et al. 2023) and claims that these methods potentially cause "interpretability illusions".

Paper
Code

Linear Representations of Sentiment in Large Language Models

1 code implementation • 23 Oct 2023 • Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, Neel Nanda

Sentiment is a pervasive feature in natural language text, yet it is an open question how sentiment is represented within Large Language Models (LLMs).

Zero-Shot Learning

Paper
Code

Rigorously Assessing Natural Language Explanations of Neurons

no code implementations • 19 Sep 2023 • Jing Huang, Atticus Geiger, Karel D'Oosterlinck, Zhengxuan Wu, Christopher Potts

Natural language is an appealing medium for explaining how large language models process and store information, but evaluating the faithfulness of such explanations is challenging.

Paper
Add Code

ScoNe: Benchmarking Negation Reasoning in Language Models With Fine-Tuning and In-Context Learning

1 code implementation • 30 May 2023 • Jingyuan Selena She, Christopher Potts, Samuel R. Bowman, Atticus Geiger

For in-context learning, we test InstructGPT models and find that most prompt strategies are not successful, including those using step-by-step reasoning.

Benchmarking In-Context Learning +3

Paper
Code

Interpretability at Scale: Identifying Causal Mechanisms in Alpaca

1 code implementation • NeurIPS 2023 • Zhengxuan Wu, Atticus Geiger, Thomas Icard, Christopher Potts, Noah D. Goodman

With Boundless DAS, we discover that Alpaca does this by implementing a causal model with two interpretable boolean variables.

458

Paper
Code

Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations

no code implementations • 5 Mar 2023 • Atticus Geiger, Zhengxuan Wu, Christopher Potts, Thomas Icard, Noah D. Goodman

In DAS, we find the alignment between high-level and low-level models using gradient descent rather than conducting a brute-force search, and we allow individual neurons to play multiple distinct roles by analyzing representations in non-standard bases-distributed representations.

Explainable artificial intelligence

Paper
Add Code

Causal Abstraction for Faithful Model Interpretation

no code implementations • 11 Jan 2023 • Atticus Geiger, Chris Potts, Thomas Icard

A faithful and interpretable explanation of an AI model's behavior and internal structure is a high-level explanation that is human-intelligible but also consistent with the known, but often opaque low-level causal details of the model.

Explainable Artificial Intelligence (XAI)

Paper
Add Code

Causal Abstraction with Soft Interventions

no code implementations • 22 Nov 2022 • Riccardo Massidda, Atticus Geiger, Thomas Icard, Davide Bacciu

Causal abstraction provides a theory describing how several causal models can represent the same system at different levels of detail.

Paper
Add Code

Causal Proxy Models for Concept-Based Model Explanations

1 code implementation • 28 Sep 2022 • Zhengxuan Wu, Karel D'Oosterlinck, Atticus Geiger, Amir Zur, Christopher Potts

The core of our proposal is the Causal Proxy Model (CPM).

Causal Inference counterfactual

Paper
Code

CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior

1 code implementation • 27 May 2022 • Eldar David Abraham, Karel D'Oosterlinck, Amir Feder, Yair Ori Gat, Atticus Geiger, Christopher Potts, Roi Reichart, Zhengxuan Wu

We introduce CEBaB, a new benchmark dataset for assessing concept-based explanation methods in Natural Language Processing (NLP).

Causal Inference counterfactual

Paper
Code

Causal Distillation for Language Models

1 code implementation • NAACL 2022 • Zhengxuan Wu, Atticus Geiger, Josh Rozner, Elisa Kreiss, Hanson Lu, Thomas Icard, Christopher Potts, Noah D. Goodman

Distillation efforts have led to language models that are more compact and efficient without serious drops in performance.

Language Modelling Masked Language Modeling +5

Paper
Code

Inducing Causal Structure for Interpretable Neural Networks

2 code implementations • 1 Dec 2021 • Atticus Geiger, Zhengxuan Wu, Hanson Lu, Josh Rozner, Elisa Kreiss, Thomas Icard, Noah D. Goodman, Christopher Potts

In IIT, we (1) align variables in a causal model (e. g., a deterministic program or Bayesian network) with representations in a neural model and (2) train the neural model to match the counterfactual behavior of the causal model on a base input when aligned representations in both models are set to be the value they would be for a source input.

counterfactual Data Augmentation +1

Paper
Code

Causal Abstractions of Neural Networks

1 code implementation • NeurIPS 2021 • Atticus Geiger, Hanson Lu, Thomas Icard, Christopher Potts

Structural analysis methods (e. g., probing and feature attribution) are increasingly important tools for neural network analysis.

Natural Language Inference

Paper
Code

Dynabench: Rethinking Benchmarking in NLP

no code implementations • NAACL 2021 • Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams

We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking.

Benchmarking

Paper
Add Code

DynaSent: A Dynamic Benchmark for Sentiment Analysis

1 code implementation • ACL 2021 • Christopher Potts, Zhengxuan Wu, Atticus Geiger, Douwe Kiela

We introduce DynaSent ('Dynamic Sentiment'), a new English-language benchmark task for ternary (positive/negative/neutral) sentiment analysis.

Sentiment Analysis

161

Paper
Code

Relational reasoning and generalization using non-symbolic neural networks

1 code implementation • 14 Jun 2020 • Atticus Geiger, Alexandra Carstensen, Michael C. Frank, Christopher Potts

In the two latter cases, our models perform tasks proposed in previous work to demarcate human-unique symbolic abilities.

Relational Reasoning Zero-shot Generalization

Paper
Code

Neural Natural Language Inference Models Partially Embed Theories of Lexical Entailment and Negation

1 code implementation • EMNLP (BlackboxNLP) 2020 • Atticus Geiger, Kyle Richardson, Christopher Potts

We address whether neural models for Natural Language Inference (NLI) can learn the compositional interactions between lexical entailment and negation, using four methods: the behavioral evaluation methods of (1) challenge test sets and (2) systematic generalization tasks, and the structural evaluation methods of (3) probes and (4) interventions.

Lexical Entailment Natural Language Inference +2

Paper
Code

Posing Fair Generalization Tasks for Natural Language Inference

no code implementations • IJCNLP 2019 • Atticus Geiger, Ignacio Cases, Lauri Karttunen, Chris Potts

Deep learning models for semantics are generally evaluated using naturalistic corpora.

Fairness Natural Language Inference

Paper
Add Code

Recursive Routing Networks: Learning to Compose Modules for Language Understanding

no code implementations • NAACL 2019 • Ignacio Cases, Clemens Rosenbaum, Matthew Riemer, Atticus Geiger, Tim Klinger, Alex Tamkin, Olivia Li, S Agarwal, hini, Joshua D. Greene, Dan Jurafsky, Christopher Potts, Lauri Karttunen

The model jointly optimizes the parameters of the functions and the meta-learner{'}s policy for routing inputs through those functions.

Decision Making Natural Language Inference

Paper
Add Code

Stress-Testing Neural Models of Natural Language Inference with Multiply-Quantified Sentences

no code implementations • 30 Oct 2018 • Atticus Geiger, Ignacio Cases, Lauri Karttunen, Christopher Potts

Standard evaluations of deep learning models for semantics using naturalistic corpora are limited in what they can tell us about the fidelity of the learned representations, because the corpora rarely come with good measures of semantic complexity.

Natural Language Inference

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.