Search Results for author: William Schuler

Found 39 papers, 8 papers with code

Depth-Bounded Statistical PCFG Induction as a Model of Human Grammar Acquisition

no code implementations • CL (ACL) 2021 • Lifeng Jin, Lane Schwartz, Finale Doshi-Velez, Timothy Miller, William Schuler

Abstract This article describes a simple PCFG induction model with a fixed category domain that predicts a large majority of attested constituent boundaries, and predicts labels consistent with nearly half of attested constituent labels on a standard evaluation data set of child-directed speech.

Paper
Add Code

Contributions of Propositional Content and Syntactic Category Information in Sentence Processing

no code implementations • NAACL (CMCL) 2021 • Byung-Doh Oh, William Schuler

Expectation-based theories of sentence processing posit that processing difficulty is determined by predictability in context.

Sentence

Paper
Add Code

Character-based PCFG Induction for Modeling the Syntactic Acquisition of Morphologically Rich Languages

no code implementations • Findings (EMNLP) 2021 • Lifeng Jin, Byung-Doh Oh, William Schuler

A subsequent evaluation on multilingual treebanks shows that the model with subword information achieves state-of-the-art results on many languages, further supporting a distributional model of syntactic acquisition.

Paper
Add Code

Coreference-aware Surprisal Predicts Brain Response

no code implementations • Findings (EMNLP) 2021 • Evan Jaffe, Byung-Doh Oh, William Schuler

Recent evidence supports a role for coreference processing in guiding human expectations about upcoming words during reading, based on covariation between reading times and word surprisal estimated by a coreference-aware semantic processing model (Jaffe et al. 2020). The present study reproduces and elaborates on this finding by (1) enabling the parser to process subword information that might better approximate human morphological knowledge, and (2) extending evaluation of coreference effects from self-paced reading to human brain imaging data.

Paper
Add Code

Frequency Explains the Inverse Correlation of Large Language Models' Size, Training Data Amount, and Surprisal's Fit to Reading Times

1 code implementation • 3 Feb 2024 • Byung-Doh Oh, Shisen Yue, William Schuler

Additionally, training dynamics reveal that during later training steps, all model variants learn to predict rare words and that larger model variants do so more accurately, which explains the detrimental effect of both training data amount and model size on fit to reading times.

Language Modelling

Paper
Code

Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions

1 code implementation • 17 May 2023 • Byung-Doh Oh, William Schuler

While there is much recent interest in studying why Transformer-based large language models make predictions the way they do, the complex computations performed within each layer have made their behavior somewhat opaque.

Language Modelling

Paper
Code

Transformer-Based Language Model Surprisal Predicts Human Reading Times Best with About Two Billion Training Tokens

no code implementations • 22 Apr 2023 • Byung-Doh Oh, William Schuler

Recent psycholinguistic studies have drawn conflicting conclusions about the relationship between the quality of a language model and the ability of its surprisal estimates to predict human reading times, which has been speculated to be due to the large gap in both the amount of training data and model capacity across studies.

Language Modelling

Paper
Add Code

Why Does Surprisal From Larger Transformer-Based Language Models Provide a Poorer Fit to Human Reading Times?

no code implementations • 23 Dec 2022 • Byung-Doh Oh, William Schuler

This work presents a detailed linguistic analysis into why larger Transformer-based pre-trained language models with more parameters and lower perplexity nonetheless yield surprisal estimates that are less predictive of human reading times.

Paper
Add Code

Entropy- and Distance-Based Predictors From GPT-2 Attention Patterns Predict Reading Times Over and Above GPT-2 Surprisal

1 code implementation • 21 Dec 2022 • Byung-Doh Oh, William Schuler

Transformer-based large language models are trained to make predictions about the next word by aggregating representations of previous tokens through their self-attention mechanism.

Informativeness Language Modelling +1

Paper
Code

A Deep Learning Approach to Analyzing Continuous-Time Systems

2 code implementations • 25 Sep 2022 • Cory Shain, William Schuler

Scientists often use observational time series data to study complex natural processes, but regression analyses often assume simplistic dynamics.

Time Series Time Series Analysis

Paper
Code

Surprisal Estimators for Human Reading Times Need Character Models

1 code implementation • ACL 2021 • Byung-Doh Oh, Christian Clark, William Schuler

While the use of character models has been popular in NLP applications, it has not been explored much in the context of psycholinguistic modeling.

Sentence

Paper
Code

Grounded PCFG Induction with Images

no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Lifeng Jin, William Schuler

Recent work in unsupervised parsing has tried to incorporate visual information into learning, but results suggest that these models need linguistic bias to compete against models that only rely on text.

Prepositional Phrase Attachment

Paper
Add Code

Coreference information guides human expectations during natural reading

no code implementations • COLING 2020 • Evan Jaffe, Cory Shain, William Schuler

Models of human sentence processing effort tend to focus on costs associated with retrieving structures and discourse referents from memory (memory-based) and/or on costs associated with anticipating upcoming words and structures based on contextual cues (expectation-based) (Levy, 2008).

Retrieval Sentence

Paper
Add Code

Memory-bounded Neural Incremental Parsing for Psycholinguistic Prediction

no code implementations • WS 2020 • Lifeng Jin, William Schuler

Syntactic surprisal has been shown to have an effect on human sentence processing, and can be predicted from prefix probabilities of generative incremental parsers.

Sentence

Paper
Add Code

The Importance of Category Labels in Grammar Induction with Child-directed Utterances

no code implementations • WS 2020 • Lifeng Jin, William Schuler

Recent progress in grammar induction has shown that grammar induction is possible without explicit assumptions of language-specific knowledge.

Paper
Add Code

A Corpus of Encyclopedia Articles with Logical Forms

no code implementations • LREC 2020 • Nathan Rasmussen, William Schuler

People can extract precise, complex logical meanings from text in documents such as tax forms and game rules, but language processing systems lack adequate training and evaluation resources to do these kinds of tasks reliably.

Paper
Add Code

Unsupervised Learning of PCFGs with Normalizing Flow

no code implementations • ACL 2019 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, Lane Schwartz, William Schuler

This paper describes a neural PCFG inducer which employs context embeddings (Peters et al., 2018) in a normalizing flow model (Dinh et al., 2015) to extend PCFG induction to use semantic and morphological information.

Language Acquisition

Paper
Add Code

Variance of Average Surprisal: A Better Predictor for Quality of Grammar from Unsupervised PCFG Induction

no code implementations • ACL 2019 • Lifeng Jin, William Schuler

In unsupervised grammar induction, data likelihood is known to be only weakly correlated with parsing accuracy, especially at convergence after multiple runs.

Model Selection

Paper
Add Code

Deconvolutional Time Series Regression: A Technique for Modeling Temporally Diffuse Effects

1 code implementation • EMNLP 2018 • Cory Shain, William Schuler

Researchers in computational psycholinguistics frequently use linear models to study time series data generated by human subjects.

regression Time Series +1

Paper
Code

Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction

1 code implementation • EMNLP 2018 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

There have been several recent attempts to improve the accuracy of grammar induction systems by bounding the recursive complexity of the induction model (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016; Jin et al., 2018).

Paper
Code

Test Sets for Chinese Nonlocal Dependency Parsing

no code implementations • LREC 2018 • Manjuan Duan, William Schuler

Dependency Parsing

Paper
Add Code

Unsupervised Grammar Induction with Depth-bounded PCFG

1 code implementation • TACL 2018 • Lifeng Jin, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

There has been recent interest in applying cognitively or empirically motivated bounds on recursion depth to limit the search space of grammar induction models (Ponvert et al., 2011; Noji and Johnson, 2016; Shain et al., 2016).

Paper
Code

Coreference and Focus in Reading Times

no code implementations • WS 2018 • Evan Jaffe, Cory Shain, William Schuler

Coreference Resolution

Paper
Add Code

Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input

no code implementations • COLING 2016 • Cory Shain, William Bryce, Lifeng Jin, Victoria Krakovna, Finale Doshi-Velez, Timothy Miller, William Schuler, Lane Schwartz

This paper presents a new memory-bounded left-corner parsing model for unsupervised raw-text syntax induction, using unsupervised hierarchical hidden Markov models (UHHMM).

Language Acquisition Sentence

Paper
Add Code

Addressing surprisal deficiencies in reading time models

no code implementations • WS 2016 • Marten van Schijndel, William Schuler

This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data.

Paper
Add Code

Memory access during incremental sentence processing causes reading time latency

no code implementations • WS 2016 • Cory Shain, Marten Van Schijndel, Richard Futrell, Edward Gibson, William Schuler

Studies on the role of memory as a predictor of reading time latencies (1) differ in their predictions about when memory effects should occur in processing and (2) have had mixed results, with strong positive effects emerging from isolated constructed stimuli and weak or even negative effects emerging from naturally-occurring stimuli.

Sentence