Search Results for author: Benjamin K. Bergen

Found 25 papers, 12 papers with code

Large Language Models Pass the Turing Test

no code implementations31 Mar 2025 Cameron R. Jones, Benjamin K. Bergen

We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3. 1-405B, and GPT-4. 5) in two randomised, controlled, and pre-registered Turing tests on independent populations.

Lies, Damned Lies, and Distributional Language Statistics: Persuasion and Deception with Large Language Models

no code implementations22 Dec 2024 Cameron R. Jones, Benjamin K. Bergen

Large Language Models (LLMs) can generate content that is as persuasive as human-written text and appear capable of selectively producing deceptive outputs.

Why do language models perform worse for morphologically complex languages?

1 code implementation21 Nov 2024 Catherine Arnett, Benjamin K. Bergen

We then propose and test three possible causes for this performance gap: morphological alignment of tokenizers, tokenization quality, and disparities in dataset sizes and measurement.

Language Modeling Language Modelling

Goldfish: Monolingual Language Models for 350 Languages

1 code implementation19 Aug 2024 Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

For many low-resource languages, the only available language models are large multilingual models trained on many languages simultaneously.

Text Generation

GPT-4 is judged more human than humans in displaced and inverted Turing tests

no code implementations11 Jul 2024 Ishika Rathi, Sydney Taylor, Benjamin K. Bergen, Cameron R. Jones

GPT-3. 5, GPT-4, and displaced human adjudicators judged whether an agent was human or AI on the basis of a Turing test transcript.

Dissecting the Ullman Variations with a SCALPEL: Why do LLMs fail at Trivial Alterations to the False Belief Task?

no code implementations20 Jun 2024 Zhiqiang Pi, Annapurna Vadaparty, Benjamin K. Bergen, Cameron R. Jones

Recent empirical results have sparked a debate about whether or not Large Language Models (LLMs) are capable of Theory of Mind (ToM).

People cannot distinguish GPT-4 from a human in a Turing test

no code implementations9 May 2024 Cameron R. Jones, Benjamin K. Bergen

We evaluated 3 systems (ELIZA, GPT-3. 5 and GPT-4) in a randomized, controlled, and preregistered Turing test.

Revenge of the Fallen? Recurrent Models Match Transformers at Predicting Human Language Comprehension Metrics

1 code implementation30 Apr 2024 James A. Michaelov, Catherine Arnett, Benjamin K. Bergen

Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online human language comprehension.

Mamba

A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages

1 code implementation1 Mar 2024 Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

We release a tool to obtain byte premiums for any two languages, enabling comparisons of dataset sizes across languages for more equitable multilingual model development and data practices.

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

1 code implementation15 Nov 2023 Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce.

Language Modeling Language Modelling

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

no code implementations15 Nov 2023 James A. Michaelov, Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

We measure crosslingual structural priming in large language models, comparing model behavior to human experimental results from eight crosslingual experiments covering six languages, and four monolingual structural priming experiments in three non-English languages.

Sentence

Does GPT-4 pass the Turing test?

no code implementations31 Oct 2023 Cameron R. Jones, Benjamin K. Bergen

We evaluated GPT-4 in a public online Turing test.

Characterizing Learning Curves During Language Model Pre-Training: Learning, Forgetting, and Stability

1 code implementation29 Aug 2023 Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

To better understand these fluctuations, we quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context.

Language Modeling Language Modelling

Emergent inabilities? Inverse scaling over the course of pretraining

no code implementations24 May 2023 James A. Michaelov, Benjamin K. Bergen

Does inverse scaling only occur as a function of model parameter size, or can it also occur over the course of training?

Language Modeling Language Modelling +1

Language Model Behavior: A Comprehensive Survey

1 code implementation20 Mar 2023 Tyler A. Chang, Benjamin K. Bergen

Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers.

Language Modeling Language Modelling +4

Can Peanuts Fall in Love with Distributional Semantics?

no code implementations20 Jan 2023 James A. Michaelov, Seana Coulson, Benjamin K. Bergen

Context changes expectations about upcoming words - following a story involving an anthropomorphic peanut, comprehenders expect the sentence the peanut was in love more than the peanut was salted, as indexed by N400 amplitude (Nieuwland & van Berkum, 2006).

Sentence

Collateral facilitation in humans and language models

1 code implementation9 Nov 2022 James A. Michaelov, Benjamin K. Bergen

Are the predictions of humans and language models affected by similar things?

XLM-R

The Geometry of Multilingual Language Model Representations

1 code implementation22 May 2022 Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen

The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.

Cross-Lingual Transfer Language Modeling +3

Word Acquisition in Neural Language Models

1 code implementation5 Oct 2021 Tyler A. Chang, Benjamin K. Bergen

We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007).

Language Acquisition

So Cloze yet so Far: N400 Amplitude is Better Predicted by Distributional Information than Human Predictability Judgements

no code implementations2 Sep 2021 James A. Michaelov, Seana Coulson, Benjamin K. Bergen

In this study, we investigate whether the linguistic predictions of computational language models or humans better reflect the way in which natural language stimuli modulate the amplitude of the N400.

Different kinds of cognitive plausibility: why are transformers better than RNNs at predicting N400 amplitude?

no code implementations20 Jul 2021 James A. Michaelov, Megan D. Bardolph, Seana Coulson, Benjamin K. Bergen

Despite being designed for performance rather than cognitive plausibility, transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures, such as recurrent neural networks.

How well does surprisal explain N400 amplitude under different experimental conditions?

1 code implementation9 Oct 2020 James A. Michaelov, Benjamin K. Bergen

We investigate the extent to which word surprisal can be used to predict a neural measure of human language processing difficulty - the N400.

Cannot find the paper you are looking for? You can Submit a new open access paper.