Search Results for author: Aaron Mueller

Found 26 papers, 12 papers with code

NNsight and NDIF: Democratizing Access to Foundation Model Internals

no code implementations18 Jul 2024 Jaden Fiotto-Kaufman, Alexander R Loftus, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, David Bau

The enormous scale of state-of-the-art foundation models has limited their accessibility to scientists, because customized experiments at large model sizes require costly hardware and complex engineering that is impractical for most researchers.

[Call for Papers] The 2nd BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

no code implementations9 Apr 2024 Leshem Choshen, Ryan Cotterell, Michael Y. Hu, Tal Linzen, Aaron Mueller, Candace Ross, Alex Warstadt, Ethan Wilcox, Adina Williams, Chengxu Zhuang

The big changes for this year's competition are as follows: First, we replace the loose track with a paper track, which allows (for example) non-model-based submissions, novel cognitively-inspired benchmarks, or analysis techniques.

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

1 code implementation13 Nov 2023 Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates.

In-Context Learning Out-of-Distribution Generalization

Function Vectors in Large Language Models

no code implementations23 Oct 2023 Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, David Bau

Using causal mediation analysis on a diverse range of in-context-learning (ICL) tasks, we find that a small number attention heads transport a compact representation of the demonstrated task, which we call a function vector (FV).

In-Context Learning

Meta-training with Demonstration Retrieval for Efficient Few-shot Learning

no code implementations30 Jun 2023 Aaron Mueller, Kanika Narang, Lambert Mathias, Qifan Wang, Hamed Firooz

Meta-training allows one to leverage smaller models for few-shot generalization in a domain-general and task-agnostic manner; however, these methods alone results in models that may not have sufficient parameterization or knowledge to adapt quickly to a large variety of tasks.

Few-Shot Learning QNLI +3

Call for Papers -- The BabyLM Challenge: Sample-efficient pretraining on a developmentally plausible corpus

1 code implementation27 Jan 2023 Alex Warstadt, Leshem Choshen, Aaron Mueller, Adina Williams, Ethan Wilcox, Chengxu Zhuang

In partnership with CoNLL and CMCL, we provide a platform for approaches to pretraining with a limited-size corpus sourced from data inspired by the input to children.

Language Acquisition Language Modelling +1

Language model acceptability judgements are not always robust to context

no code implementations18 Dec 2022 Koustuv Sinha, Jon Gauthier, Aaron Mueller, Kanishka Misra, Keren Fuentes, Roger Levy, Adina Williams

In this paper, we investigate the stability of language models' performance on targeted syntactic evaluations as we vary properties of the input context: the length of the context, the types of syntactic phenomena it contains, and whether or not there are violations of grammaticality.

In-Context Learning Language Modelling +1

Causal Analysis of Syntactic Agreement Neurons in Multilingual Language Models

1 code implementation25 Oct 2022 Aaron Mueller, Yu Xia, Tal Linzen

However, much of this analysis has focused on monolingual models, and analyses of multilingual models have employed correlational methods that are confounded by the choice of probing tasks.


Label Semantic Aware Pre-training for Few-shot Text Classification

1 code implementation ACL 2022 Aaron Mueller, Jason Krone, Salvatore Romeo, Saab Mansour, Elman Mansimov, Yi Zhang, Dan Roth

Label semantic aware systems have leveraged this information for improved text classification performance during fine-tuning and prediction.

Few-Shot Text Classification Sentence +2

Coloring the Blank Slate: Pre-training Imparts a Hierarchical Inductive Bias to Sequence-to-sequence Models

1 code implementation Findings (ACL) 2022 Aaron Mueller, Robert Frank, Tal Linzen, Luheng Wang, Sebastian Schuster

We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not.

Inductive Bias

Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models

1 code implementation ACL 2021 Matthew Finlayson, Aaron Mueller, Sebastian Gehrmann, Stuart Shieber, Tal Linzen, Yonatan Belinkov

Targeted syntactic evaluations have demonstrated the ability of language models to perform subject-verb agreement given difficult contexts.


Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling

1 code implementation NAACL 2021 Aaron Mueller, Mark Dredze

Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models.

Classification Cross-Lingual Transfer +3

Demographic Representation and Collective Storytelling in the Me Too Twitter Hashtag Activism Movement

no code implementations13 Oct 2020 Aaron Mueller, Zach Wood-Doughty, Silvio Amir, Mark Dredze, Alicia L. Nobles

The #MeToo movement on Twitter has drawn attention to the pervasive nature of sexual harassment and violence.

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages

no code implementations LREC 2020 Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky

We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance{---}the best number depends on the source language.

Low Resource Neural Machine Translation Low-Resource Neural Machine Translation +1

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages

no code implementations LREC 2020 Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky

Exploiting the broad translation of the Bible into the world{'}s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology.


Modeling Color Terminology Across Thousands of Languages

1 code implementation IJCNLP 2019 Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky

There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969).

Quantity doesn't buy quality syntax with neural language models

no code implementations IJCNLP 2019 Marten van Schijndel, Aaron Mueller, Tal Linzen

We investigate to what extent these shortcomings can be mitigated by increasing the size of the network and the corpus on which it is trained.

Cannot find the paper you are looking for? You can Submit a new open access paper.