Search Results for author: Michael Hahn

Found 23 papers, 10 papers with code

An Information-Theoretic Characterization of Morphological Fusion

1 code implementation EMNLP 2021 Neil Rathi, Michael Hahn, Richard Futrell

Linguistic typology generally divides synthetic languages into groups based on their morphological fusion.

Born a Transformer -- Always a Transformer?

1 code implementation27 May 2025 Yana Veitsman, Mayank Jobanputra, Yash Sarrof, Aleksandra Bakalova, Vera Demberg, Ellie Pavlick, Michael Hahn

Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained transformers.

Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers

no code implementations4 Feb 2025 Alireza Amiri, Xinting Huang, Mark Rofin, Michael Hahn

Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers.

Hard Attention

A Formal Framework for Understanding Length Generalization in Transformers

1 code implementation3 Oct 2024 Xinting Huang, Andy Yang, Satwik Bhattamishra, Yash Sarrof, Andreas Krebs, Hattie Zhou, Preetum Nakkiran, Michael Hahn

A major challenge for transformers is generalizing to sequences longer than those observed during training.

Separations in the Representational Capabilities of Transformers and Recurrent Architectures

no code implementations13 Jun 2024 Satwik Bhattamishra, Michael Hahn, Phil Blunsom, Varun Kanade

Furthermore, we show that two-layer Transformers of logarithmic size can perform decision tasks such as string equality or disjointness, whereas both one-layer Transformers and recurrent models require linear size for these tasks.

InversionView: A General-Purpose Method for Reading Information from Neural Activations

1 code implementation27 May 2024 Xinting Huang, Madhur Panwar, Navin Goyal, Michael Hahn

The inner workings of neural networks can be better understood if we can fully decipher the information encoded in neural activations.

Decoder

The Expressive Capacity of State Space Models: A Formal Language Perspective

no code implementations27 May 2024 Yash Sarrof, Yana Veitsman, Michael Hahn

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers.

Language Modeling Language Modelling +2

Linguistic Structure from a Bottleneck on Sequential Information Processing

no code implementations20 May 2024 Richard Futrell, Michael Hahn

It establishes a link between the statistical and algebraic structure of human language, and reinforces the idea that the structure of human language is shaped by communication under cognitive constraints.

Form

Why are Sensitive Functions Hard for Transformers?

1 code implementation15 Feb 2024 Michael Hahn, Mark Rofin

We show theoretically and empirically that this theory unifies a broad array of empirical observations about the learning abilities and biases of transformers, such as their generalization bias towards low sensitivity and low degree, and difficulty in length generalization for PARITY.

A Theory of Emergent In-Context Learning as Implicit Structure Induction

no code implementations14 Mar 2023 Michael Hahn, Navin Goyal

Scaling large language models (LLMs) leads to an emergent capacity to learn in-context from example demonstrations.

In-Context Learning

Crosslinguistic word order variation reflects evolutionary pressures of dependency and information locality

1 code implementation9 Jun 2022 Michael Hahn, Yang Xu

Using data from 80 languages in 17 language families and phylogenetic modeling, we demonstrate that languages evolve to balance these pressures, such that word order change is accompanied by change in the frequency distribution of the syntactic structures which speakers communicate to maintain overall efficiency.

Sensitivity as a Complexity Measure for Sequence Classification Tasks

1 code implementation21 Apr 2021 Michael Hahn, Dan Jurafsky, Richard Futrell

We introduce a theoretical framework for understanding and predicting the complexity of sequence classification tasks, using a novel extension of the theory of Boolean function sensitivity.

General Classification text-classification +1

Theoretical Limitations of Self-Attention in Neural Sequence Models

no code implementations TACL 2020 Michael Hahn

These limitations seem surprising given the practical success of self-attention and the prominent role assigned to hierarchical structure in linguistics, suggesting that natural language can be approximated well with models that are too weak for the formal languages typically assumed in theoretical linguistics.

Hard Attention

Character-based Surprisal as a Model of Reading Difficulty in the Presence of Error

no code implementations2 Feb 2019 Michael Hahn, Frank Keller, Yonatan Bisk, Yonatan Belinkov

Also, transpositions are more difficult than misspellings, and a high error rate increases difficulty for all words, including correct ones.

Modeling Task Effects in Human Reading with Neural Network-based Attention

no code implementations31 Jul 2018 Michael Hahn, Frank Keller

Research on human reading has long documented that reading behavior shows task-specific effects, but it has been challenging to build general models predicting what reading behavior humans will show in a given task.

Question Answering Reading Comprehension

Cannot find the paper you are looking for? You can Submit a new open access paper.