Search Results for author: Catherine Arnett

Found 5 papers, 2 papers with code

A Bit of a Problem: Measurement Disparities in Dataset Sizes Across Languages

1 code implementation1 Mar 2024 Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

We release a tool to obtain byte premiums for any two languages, enabling comparisons of dataset sizes across languages for more equitable multilingual model development and data practices.

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

no code implementations15 Nov 2023 James A. Michaelov, Catherine Arnett, Tyler A. Chang, Benjamin K. Bergen

We measure crosslingual structural priming in large language models, comparing model behavior to human experimental results from eight crosslingual experiments covering six languages, and four monolingual structural priming experiments in three non-English languages.

Sentence

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

1 code implementation15 Nov 2023 Tyler A. Chang, Catherine Arnett, Zhuowen Tu, Benjamin K. Bergen

However, concrete evidence for the effects of multilinguality on language modeling performance in individual languages remains scarce.

Language Modelling

Crosslingual Structural Priming and the Pre-Training Dynamics of Bilingual Language Models

no code implementations11 Oct 2023 Catherine Arnett, Tyler A. Chang, James A. Michaelov, Benjamin K. Bergen

Do multilingual language models share abstract grammatical representations across languages, and if so, when do these develop?

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.