Search Results for author: R. Thomas McCoy

Found 36 papers, 12 papers with code

Learning to Reason via Mixture-of-Thought for Logical Reasoning

1 code implementation21 May 2025 Tong Zheng, Lichang Chen, Simeng Han, R. Thomas McCoy, Heng Huang

To fill in this gap, we propose Mixture-of-Thought (MoT), a framework that enables LLMs to reason across three complementary modalities: natural language, code, and a newly introduced symbolic modality, truth-table, which systematically enumerates logical cases and partially mitigates key failure modes in natural language reasoning.

Logical Reasoning Natural Language Inference

Identifying and Mitigating the Influence of the Prior Distribution in Large Language Models

no code implementations17 Apr 2025 Liyi Zhang, Veniamin Veselovsky, R. Thomas McCoy, Thomas L. Griffiths

Specifically, we show that it is possible to identify layers of the underlying neural network that correlate with the prior probability of a response and that lightweight finetuning of these layers with basic prompts on prior-dominated tasks achieves high performance on held-out answers.

Convolutional Neural Networks Can (Meta-)Learn the Same-Different Relation

no code implementations29 Mar 2025 Max Gupta, Sunayana Rane, R. Thomas McCoy, Thomas L. Griffiths

While convolutional neural networks (CNNs) have come to match and exceed human performance in many settings, the tasks these models optimize for are largely constrained to the level of individual objects, such as classification and captioning.

Meta-Learning Relation

Teasing Apart Architecture and Initial Weights as Sources of Inductive Bias in Neural Networks

no code implementations27 Feb 2025 Gianluca Bencomo, Max Gupta, Ioana Marinescu, R. Thomas McCoy, Thomas L. Griffiths

But what those networks can learn depends upon their inductive biases -- the factors other than the data that influence the solutions they discover -- and the inductive biases of neural networks remain poorly understood, limiting our ability to draw conclusions about human learning from the performance of these systems.

Inductive Bias Meta-Learning

Minimization of Boolean Complexity in In-Context Concept Learning

no code implementations3 Dec 2024 Leroy Z. Wang, R. Thomas McCoy, Shane Steinert-Threlkeld

What factors contribute to the relative success and corresponding difficulties of in-context learning for Large Language Models (LLMs)?

In-Context Learning

When a language model is optimized for reasoning, does it still show embers of autoregression? An analysis of OpenAI o1

no code implementations2 Oct 2024 R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, Thomas L. Griffiths

In "Embers of Autoregression" (McCoy et al., 2023), we showed that several large language models (LLMs) have some important limitations that are attributable to their origins in next-word prediction.

Language Modeling Language Modelling

Deciphering the Factors Influencing the Efficacy of Chain-of-Thought: Probability, Memorization, and Noisy Reasoning

1 code implementation1 Jul 2024 Akshara Prabhakar, Thomas L. Griffiths, R. Thomas McCoy

By focusing on a single relatively simple task, we are able to identify three factors that systematically affect CoT performance: the probability of the task's expected output (probability), what the model has implicitly learned during pre-training (memorization), and the number of intermediate operations involved in reasoning (noisy reasoning).

Memorization

Is In-Context Learning a Type of Error-Driven Learning? Evidence from the Inverse Frequency Effect in Structural Priming

no code implementations26 Jun 2024 Zhenghao Zhou, Robert Frank, R. Thomas McCoy

In our experiments, we simulated structural priming with ICL and found that LLMs indeed display the IFE, with the effect being stronger in larger models.

In-Context Learning Sentence

modeLing: A Novel Dataset for Testing Linguistic Reasoning in Language Models

no code implementations24 Jun 2024 Nathan A. Chi, Teodor Malchev, Riley Kong, Ryan A. Chi, Lucas Huang, Ethan A. Chi, R. Thomas McCoy, Dragomir Radev

We introduce modeLing, a novel benchmark of Linguistics Olympiad-style puzzles which tests few-shot reasoning in AI systems.

Memorization

Deep de Finetti: Recovering Topic Distributions from Large Language Models

no code implementations21 Dec 2023 Liyi Zhang, R. Thomas McCoy, Theodore R. Sumers, Jian-Qiao Zhu, Thomas L. Griffiths

Large language models (LLMs) can produce long, coherent passages of text, suggesting that LLMs, although trained on next-word prediction, must represent the latent structure that characterizes a document.

Bayesian Inference

Bayes in the age of intelligent machines

no code implementations16 Nov 2023 Thomas L. Griffiths, Jian-Qiao Zhu, Erin Grant, R. Thomas McCoy

The success of methods based on artificial neural networks in creating intelligent machines seems like it might pose a challenge to explanations of human cognition in terms of Bayesian inference.

Bayesian Inference

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

no code implementations24 Sep 2023 R. Thomas McCoy, Shunyu Yao, Dan Friedman, Matthew Hardy, Thomas L. Griffiths

This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input.

Modeling rapid language learning by distilling Bayesian priors into artificial neural networks

1 code implementation24 May 2023 R. Thomas McCoy, Thomas L. Griffiths

We show that learning from limited naturalistic data is possible with an approach that combines the strong inductive biases of a Bayesian model with the flexible representations of a neural network.

How poor is the stimulus? Evaluating hierarchical generalization in neural networks trained on child-directed speech

1 code implementation26 Jan 2023 Aditya Yedetore, Tal Linzen, Robert Frank, R. Thomas McCoy

When acquiring syntax, children consistently choose hierarchical rules over competing non-hierarchical possibilities.

Neurocompositional computing: From the Central Paradox of Cognition to a new generation of AI systems

no code implementations2 May 2022 Paul Smolensky, R. Thomas McCoy, Roland Fernandez, Matthew Goldrick, Jianfeng Gao

What explains the dramatic progress from 20th-century to 21st-century AI, and how can the remaining limitations of current AI be overcome?

Universal linguistic inductive biases via meta-learning

1 code implementation29 Jun 2020 R. Thomas McCoy, Erin Grant, Paul Smolensky, Thomas L. Griffiths, Tal Linzen

To facilitate computational modeling aimed at addressing this question, we introduce a framework for giving particular linguistic inductive biases to a neural network model; such a model can then be used to empirically explore the effects of those inductive biases.

Language Acquisition Meta-Learning

Representations of Syntax [MASK] Useful: Effects of Constituency and Dependency Structure in Recursive LSTMs

1 code implementation ACL 2020 Michael A. Lepori, Tal Linzen, R. Thomas McCoy

Sequence-based neural networks show significant sensitivity to syntactic structure, but they still perform less well on syntactic tasks than tree-based networks.

Data Augmentation

Syntactic Data Augmentation Increases Robustness to Inference Heuristics

1 code implementation ACL 2020 Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen

Pretrained neural models such as BERT, when fine-tuned to perform natural language inference (NLI), often show high accuracy on standard datasets, but display a surprising lack of sensitivity to word order on controlled challenge sets.

Data Augmentation Natural Language Inference

Does syntax need to grow on trees? Sources of hierarchical inductive bias in sequence-to-sequence networks

no code implementations TACL 2020 R. Thomas McCoy, Robert Frank, Tal Linzen

We investigate which architectural factors affect the generalization behavior of neural sequence-to-sequence models trained on two syntactic tasks, English question formation and English tense reinflection.

Inductive Bias

RNNs implicitly implement tensor-product representations

1 code implementation ICLR 2019 R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies).

Representation Learning Sentence

Looking for ELMo's friends: Sentence-Level Pretraining Beyond Language Modeling

no code implementations ICLR 2019 Samuel R. Bowman, Ellie Pavlick, Edouard Grave, Benjamin Van Durme, Alex Wang, Jan Hula, Patrick Xia, Raghavendra Pappagari, R. Thomas McCoy, Roma Patel, Najoung Kim, Ian Tenney, Yinghui Huang, Katherin Yu, Shuning Jin, Berlin Chen

Work on the problem of contextualized word representation—the development of reusable neural network components for sentence understanding—has recently seen a surge of progress centered on the unsupervised pretraining task of language modeling with methods like ELMo (Peters et al., 2018).

Language Modeling Language Modelling +1

Probing What Different NLP Tasks Teach Machines about Function Word Comprehension

no code implementations SEMEVAL 2019 Najoung Kim, Roma Patel, Adam Poliak, Alex Wang, Patrick Xia, R. Thomas McCoy, Ian Tenney, Alexis Ross, Tal Linzen, Benjamin Van Durme, Samuel R. Bowman, Ellie Pavlick

Our results show that pretraining on language modeling performs the best on average across our probing tasks, supporting its widespread use for pretraining state-of-the-art NLP models, and CCG supertagging and NLI pretraining perform comparably.

CCG Supertagging Language Modeling +4

Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

6 code implementations ACL 2019 R. Thomas McCoy, Ellie Pavlick, Tal Linzen

We find that models trained on MNLI, including BERT, a state-of-the-art model, perform very poorly on HANS, suggesting that they have indeed adopted these heuristics.

Natural Language Inference Sentence

RNNs Implicitly Implement Tensor Product Representations

no code implementations20 Dec 2018 R. Thomas McCoy, Tal Linzen, Ewan Dunbar, Paul Smolensky

Recurrent neural networks (RNNs) can learn continuous vector representations of symbolic structures such as sequences and sentences; these representations often exhibit linear regularities (analogies).

Representation Learning Sentence

Non-entailed subsequences as a challenge for natural language inference

no code implementations29 Nov 2018 R. Thomas McCoy, Tal Linzen

Neural network models have shown great success at natural language inference (NLI), the task of determining whether a premise entails a hypothesis.

Natural Language Inference Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.