no code implementations • 31 Mar 2025 • Cameron R. Jones, Benjamin K. Bergen
We evaluated 4 systems (ELIZA, GPT-4o, LLaMa-3. 1-405B, and GPT-4. 5) in two randomised, controlled, and pre-registered Turing tests on independent populations.
no code implementations • 22 Dec 2024 • Cameron R. Jones, Benjamin K. Bergen
Large Language Models (LLMs) can generate content that is as persuasive as human-written text and appear capable of selectively producing deceptive outputs.
1 code implementation • 21 Nov 2024 • Catherine Arnett, Benjamin K. Bergen
We then propose and test three possible causes for this performance gap: morphological alignment of tokenizers, tokenization quality, and disparities in dataset sizes and measurement.
1 code implementation • 19 Aug 2024 • Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen
For many low-resource languages, the only available language models are large multilingual models trained on many languages simultaneously.
no code implementations • 11 Jul 2024 • Ishika Rathi, Sydney Taylor, Benjamin K. Bergen, Cameron R. Jones
GPT-3. 5, GPT-4, and displaced human adjudicators judged whether an agent was human or AI on the basis of a Turing test transcript.
no code implementations • 20 Jun 2024 • Zhiqiang Pi, Annapurna Vadaparty, Benjamin K. Bergen, Cameron R. Jones
Recent empirical results have sparked a debate about whether or not Large Language Models (LLMs) are capable of Theory of Mind (ToM).
no code implementations • 9 May 2024 • Cameron R. Jones, Benjamin K. Bergen
We evaluated 3 systems (ELIZA, GPT-3. 5 and GPT-4) in a randomized, controlled, and preregistered Turing test.
1 code implementation • 30 Apr 2024 • James A. Michaelov, Catherine Arnett, Benjamin K. Bergen
Transformers have generally supplanted recurrent neural networks as the dominant architecture for both natural language processing tasks and for modelling the effect of predictability on online human language comprehension.
1 code implementation • 1 Mar 2024 • Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen
We release a tool to obtain byte premiums for any two languages, enabling comparisons of dataset sizes across languages for more equitable multilingual model development and data practices.
1 code implementation • 15 Nov 2023 • Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen
However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce.
no code implementations • 15 Nov 2023 • James A. Michaelov, Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen
We measure crosslingual structural priming in large language models, comparing model behavior to human experimental results from eight crosslingual experiments covering six languages, and four monolingual structural priming experiments in three non-English languages.
no code implementations • 31 Oct 2023 • Cameron R. Jones, Benjamin K. Bergen
We evaluated GPT-4 in a public online Turing test.
no code implementations • 11 Oct 2023 • Catherine Arnett, Tyler A. Chang, James A. Michaelov, Benjamin K. Bergen
Do multilingual language models share abstract grammatical representations across languages, and if so, when do these develop?
1 code implementation • 29 Aug 2023 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
To better understand these fluctuations, we quantify the final surprisal, within-run variability, age of acquisition, forgettability, and cross-run variability of learning curves for individual tokens in context.
no code implementations • 24 May 2023 • James A. Michaelov, Benjamin K. Bergen
Does inverse scaling only occur as a function of model parameter size, or can it also occur over the course of training?
1 code implementation • 20 Mar 2023 • Tyler A. Chang, Benjamin K. Bergen
Transformer language models have received widespread public attention, yet their generated text is often surprising even to NLP researchers.
no code implementations • 20 Jan 2023 • James A. Michaelov, Seana Coulson, Benjamin K. Bergen
Context changes expectations about upcoming words - following a story involving an anthropomorphic peanut, comprehenders expect the sentence the peanut was in love more than the peanut was salted, as indexed by N400 amplitude (Nieuwland & van Berkum, 2006).
no code implementations • 16 Dec 2022 • James A. Michaelov, Benjamin K. Bergen
How well do language models deal with quantification?
1 code implementation • 9 Nov 2022 • James A. Michaelov, Benjamin K. Bergen
Are the predictions of humans and language models affected by similar things?
1 code implementation • COLING 2022 • James A. Michaelov, Benjamin K. Bergen
Some languages allow arguments to be omitted in certain contexts.
1 code implementation • 22 May 2022 • Tyler A. Chang, Zhuowen Tu, Benjamin K. Bergen
The subspace means differ along language-sensitive axes that are relatively stable throughout middle layers, and these axes encode information such as token vocabularies.
1 code implementation • 5 Oct 2021 • Tyler A. Chang, Benjamin K. Bergen
We investigate how neural language models acquire individual words during training, extracting learning curves and ages of acquisition for over 600 words on the MacArthur-Bates Communicative Development Inventory (Fenson et al., 2007).
no code implementations • 2 Sep 2021 • James A. Michaelov, Seana Coulson, Benjamin K. Bergen
In this study, we investigate whether the linguistic predictions of computational language models or humans better reflect the way in which natural language stimuli modulate the amplitude of the N400.
no code implementations • 20 Jul 2021 • James A. Michaelov, Megan D. Bardolph, Seana Coulson, Benjamin K. Bergen
Despite being designed for performance rather than cognitive plausibility, transformer language models have been found to be better at predicting metrics used to assess human language comprehension than language models with other architectures, such as recurrent neural networks.
1 code implementation • 9 Oct 2020 • James A. Michaelov, Benjamin K. Bergen
We investigate the extent to which word surprisal can be used to predict a neural measure of human language processing difficulty - the N400.