Search Results for author: Kyle Mahowald

Found 44 papers, 21 papers with code

When classifying grammatical role, BERT doesn’t care about word order... except when it matters

no code implementations ACL 2022 Isabel Papadimitriou, Richard Futrell, Kyle Mahowald

Because meaning can often be inferred from lexical semantics alone, word order is often a redundant cue in natural language.

Investigating Information-Theoretic Properties of the Typology of Spatial Demonstratives

no code implementations NAACL (SIGTYP) 2022 Sihan Chen, Richard Futrell, Kyle Mahowald

Using data from Nintemann et al. (2020), we explore the variability in complexity and informativity across spatial demonstrative systems using spatial deictic lexicons from 223 languages.

Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions

no code implementations21 May 2025 Sasha Boguraev, Christopher Potts, Kyle Mahowald

Large Language Models (LLMs) have emerged as powerful sources of evidence for linguists seeking to develop theories of syntax.

A suite of LMs comprehend puzzle statements as well as humans

no code implementations13 May 2025 Adele E Goldberg, Supantho Rakshit, Jennifer Hu, Kyle Mahowald

Using the same stimuli, we report a preregistered study comparing human responses in two conditions: one allowed rereading (replicating the original study), and one that restricted rereading (a more naturalistic comprehension test).

Experimental Design

Both Direct and Indirect Evidence Contribute to Dative Alternation Preferences in Language Models

1 code implementation26 Mar 2025 Qing Yao, Kanishka Misra, Leonie Weissweiler, Kyle Mahowald

Language models (LMs) tend to show human-like preferences on a number of syntactic phenomena, but the extent to which these are attributable to direct exposure to the phenomena or more general properties of language is unclear.

Language Models Fail to Introspect About Their Knowledge of Language

1 code implementation10 Mar 2025 Siyuan Song, Jennifer Hu, Kyle Mahowald

Our findings complicate recent results suggesting that models can introspect, and add new evidence to the argument that prompted responses should not be conflated with models' linguistic generalizations.

Sentence

Constructions are Revealed in Word Distributions

1 code implementation8 Mar 2025 Joshua Rozner, Leonie Weissweiler, Kyle Mahowald, Cory Shain

Construction grammar posits that constructions (form-meaning pairings) are acquired through experience with language (the distributional learning hypothesis).

counterfactual

Linguistic Generalizations are not Rules: Impacts on Evaluation of LMs

no code implementations18 Feb 2025 Leonie Weissweiler, Kyle Mahowald, Adele Goldberg

Linguistic evaluations of how well LMs generalize to produce or understand novel text often implicitly take for granted that natural languages are generated by symbolic rules.

Formal Logic Semantic Parsing

How Linguistics Learned to Stop Worrying and Love the Language Models

no code implementations28 Jan 2025 Richard Futrell, Kyle Mahowald

On the other side, there have been claims that the success of LMs obviates the need for studying linguistic theory and structure.

Models Can and Should Embrace the Communicative Nature of Human-Generated Math

no code implementations25 Sep 2024 Sasha Boguraev, Ben Lipkin, Leonie Weissweiler, Kyle Mahowald

Math is constructed by people for people: just as natural language corpora reflect not just propositions but the communicative goals of language users, the math data that models are trained on reflects not just idealized mathematical entities but rich communicative intentions.

Math

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

1 code implementation18 Sep 2024 Zayne Sprague, Fangcong Yin, Juan Diego Rodriguez, Dongwei Jiang, Manya Wadhwa, Prasann Singhal, Xinyu Zhao, Xi Ye, Kyle Mahowald, Greg Durrett

Chain-of-thought (CoT) via prompting is the de facto method for eliciting reasoning capabilities from large language models (LLMs).

Math MMLU

Do they mean 'us'? Interpreting Referring Expressions in Intergroup Bias

1 code implementation25 Jun 2024 Venkata S Govindarajan, Matianyu Zang, Kyle Mahowald, David Beaver, Junyi Jessy Li

We curate a unique dataset of over 6 million game-time comments from opposing perspectives (the teams in the game), each comment grounded in a non-linguistic description of the events that precipitated these comments (live win probabilities for each team).

Participle-Prepended Nominals Have Lower Entropy Than Nominals Appended After the Participle

no code implementations16 May 2024 Kristie Denlinger, Stephen Wechsler, Kyle Mahowald

That is, we compare the entropy of $\alpha$ in compound construction slots like $\alpha$-[V]ed to the entropy of $\alpha$ in phrasal constructions like [V]ed by $\alpha$ for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction.

Language Models Learn Rare Phenomena from Less Rare Phenomena: The Case of the Missing AANNs

no code implementations28 Mar 2024 Kanishka Misra, Kyle Mahowald

To that end, we iteratively trained transformer language models on systematically manipulated corpora which were human-scale in size, and then evaluated their learning of a rare grammatical phenomenon: the English Article+Adjective+Numeral+Noun (AANN) construction (``a beautiful five days'').

counterfactual Memorization

Mission: Impossible Language Models

1 code implementation12 Jan 2024 Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts

Chomsky and others have very directly claimed that large language models (LLMs) are equally capable of learning languages that are possible and impossible for humans to learn.

Experimental Contexts Can Facilitate Robust Semantic Property Inference in Language Models, but Inconsistently

no code implementations12 Jan 2024 Kanishka Misra, Allyson Ettinger, Kyle Mahowald

Recent zero-shot evaluations have highlighted important limitations in the abilities of language models (LMs) to perform meaning extraction.

Novel Concepts

Revisiting the Optimality of Word Lengths

1 code implementation6 Dec 2023 Tiago Pimentel, Clara Meister, Ethan Gotlieb Wilcox, Kyle Mahowald, Ryan Cotterell

Under this method, we find that a language's word lengths should instead be proportional to the surprisal's expectation plus its variance-to-mean ratio.

Counterfactually Probing Language Identity in Multilingual Models

1 code implementation29 Oct 2023 Anirudh Srinivasan, Venkata S Govindarajan, Kyle Mahowald

We use one such technique, AlterRep, a method of counterfactual probing, to explore the internal structure of multilingual models (mBERT and XLM-R).

counterfactual Language Modeling +2

Lil-Bevo: Explorations of Strategies for Training Language Models in More Humanlike Ways

1 code implementation26 Oct 2023 Venkata S Govindarajan, Juan Diego Rodriguez, Kaj Bostrom, Kyle Mahowald

We pretrained our masked language models with three ingredients: an initial pretraining with music data, training on shorter sequences before training on longer ones, and masking specific tokens to target some of the BLiMP subtasks.

Language Modeling Language Modelling +1

Counterfactual Probing for the Influence of Affect and Specificity on Intergroup Bias

1 code implementation25 May 2023 Venkata S Govindarajan, Kyle Mahowald, David I. Beaver, Junyi Jessy Li

While existing work on studying bias in NLP focues on negative or pejorative language use, Govindarajan et al. (2023) offer a revised framing of bias in terms of intergroup social context, and its effects on language behavior.

counterfactual Specificity

Elaborative Simplification as Implicit Questions Under Discussion

no code implementations17 May 2023 Yating Wu, William Sheffield, Kyle Mahowald, Junyi Jessy Li

Automated text simplification, a technique useful for making text more accessible to people such as children and emergent bilinguals, is often thought of as a monolingual translation task from complex sentences to simplified sentences using encoder-decoder models.

Decoder Question Generation +2

For Generated Text, Is NLI-Neutral Text the Best Text?

1 code implementation16 Feb 2023 Michail Mersinias, Kyle Mahowald

We explore incorporating natural language inference (NLI) into the text generative pipeline by using a pre-trained NLI model to assess whether a generated sentence entails, contradicts, or is neutral to the prompt and preceding text.

Natural Language Inference Sentence +1

A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction

no code implementations29 Jan 2023 Kyle Mahowald

I validate the prompt using the CoLA corpus of acceptability judgments and then zero in on the AANN construction.

CoLA

Dissociating language and thought in large language models

no code implementations16 Jan 2023 Kyle Mahowald, Anna A. Ivanova, Idan A. Blank, Nancy Kanwisher, Joshua B. Tenenbaum, Evelina Fedorenko

Large Language Models (LLMs) have come closest among all models to date to mastering human language, yet opinions about their linguistic and cognitive capabilities remain split.

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training

1 code implementation19 Dec 2022 Jing Huang, Zhengxuan Wu, Kyle Mahowald, Christopher Potts

Language tasks involving character-level manipulations (e. g., spelling corrections, arithmetic operations, word games) are challenging for models operating on subword units.

Spelling Correction

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

1 code implementation1 Nov 2022 Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald

Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning.

Data Augmentation Image Retrieval +2

What do tokens know about their characters and how do they know it?

1 code implementation NAACL 2022 Ayush Kaushal, Kyle Mahowald

Pre-trained language models (PLMs) that use subword tokenization schemes can succeed at a variety of language tasks that require character-level information, despite lacking explicit access to the character composition of tokens.

When classifying grammatical role, BERT doesn't care about word order... except when it matters

1 code implementation11 Mar 2022 Isabel Papadimitriou, Richard Futrell, Kyle Mahowald

Because meaning can often be inferred from lexical semantics alone, word order is often a redundant cue in natural language.

Grammatical cues to subjecthood are redundant in a majority of simple clauses across languages

no code implementations30 Jan 2022 Kyle Mahowald, Evgeniia Diachek, Edward Gibson, Evelina Fedorenko, Richard Futrell

The conclusion is that grammatical cues such as word order are necessary to convey subjecthood and objecthood in a minority of naturally occurring transitive clauses; nevertheless, they can (a) provide an important source of redundancy and (b) are crucial for conveying intended meaning that cannot be inferred from the words alone, including descriptions of human interactions, where roles are often reversible (e. g., Ray helped Lu/Lu helped Ray), and expressing non-prototypical meanings (e. g., "The bone chewed the dog.

Sentence World Knowledge

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

1 code implementation EMNLP 2021 Alex Jones, William Yang Wang, Kyle Mahowald

We verify some of our linguistic findings by looking at the effect of morphological segmentation on English-Inuktitut alignment, in addition to examining the effect of word order agreement on isomorphism for 66 zero-shot language pairs from a different corpus.

Retrieval Sentence

How (Non-)Optimal is the Lexicon?

no code implementations NAACL 2021 Tiago Pimentel, Irene Nikkarinen, Kyle Mahowald, Ryan Cotterell, Damián Blasi

Examining corpora from 7 typologically diverse languages, we use those upper bounds to quantify the lexicon's optimality and to explore the relative costs of major constraints on natural codes.

Decrypting Cryptic Crosswords: Semantically Complex Wordplay Puzzles as a Target for NLP

2 code implementations NeurIPS 2021 Joshua Rozner, Christopher Potts, Kyle Mahowald

Cryptic crosswords, the dominant crossword variety in the UK, are a promising target for advancing NLP systems that seek to process semantically complex, highly compositional language.

Language Modelling

Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT

1 code implementation EACL 2021 Isabel Papadimitriou, Ethan A. Chi, Richard Futrell, Kyle Mahowald

Further examining the characteristics that our classifiers rely on, we find that features such as passive voice, animacy and case strongly correlate with classification decisions, suggesting that mBERT does not encode subjecthood purely syntactically, but that subjecthood embedding is continuous and dependent on semantic and discourse factors, as is proposed in much of the functional linguistics literature.

Sentence

With Little Power Comes Great Responsibility

2 code implementations EMNLP 2020 Dallas Card, Peter Henderson, Urvashi Khandelwal, Robin Jia, Kyle Mahowald, Dan Jurafsky

Despite its importance to experimental design, statistical power (the probability that, given a real effect, an experiment will reject the null hypothesis) has largely been ignored by the NLP community.

Experimental Design Machine Translation +1

Response to Liu, Xu, and Liang (2015) and Ferrer-i-Cancho and Gómez-Rodríguez (2015) on Dependency Length Minimization

no code implementations1 Oct 2015 Richard Futrell, Kyle Mahowald, Edward Gibson

We address recent criticisms (Liu et al., 2015; Ferrer-i-Cancho and G\'omez-Rodr\'iguez, 2015) of our work on empirical evidence of dependency length minimization across languages (Futrell et al., 2015).

Cannot find the paper you are looking for? You can Submit a new open access paper.