Search Results for author: Jack Merullo

Found 13 papers, 9 papers with code

$100K or 100 Days: Trade-offs when Pre-Training with Academic Resources

1 code implementation30 Oct 2024 Apoorv Khandelwal, Tian Yun, Nihal V. Nayak, Jack Merullo, Stephen H. Bach, Chen Sun, Ellie Pavlick

We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed.

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

no code implementations13 Jun 2024 Jack Merullo, Carsten Eickhoff, Ellie Pavlick

Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model.

Language Modelling

Dual Process Learning: Controlling Use of In-Context vs. In-Weights Strategies with Weight Forgetting

no code implementations28 May 2024 Suraj Anand, Michael A. Lepori, Jack Merullo, Ellie Pavlick

Hence, we study $\textbf{structural in-context learning}$, which we define as the ability of a model to execute in-context learning on arbitrary tokens -- so called because the model must generalize on the basis of e. g. sentence structure or task structure, rather than semantic content encoded in token embeddings.

In-Context Learning

Transformer Mechanisms Mimic Frontostriatal Gating Operations When Trained on Human Working Memory Tasks

no code implementations13 Feb 2024 Aaron Traylor, Jack Merullo, Michael J. Frank, Ellie Pavlick

Models based on the Transformer neural network architecture have seen success on a wide variety of tasks that appear to require complex "cognitive branching" -- or the ability to maintain pursuit of one goal while accomplishing others.

Characterizing Mechanisms for Factual Recall in Language Models

no code implementations24 Oct 2023 Qinan Yu, Jack Merullo, Ellie Pavlick

By scaling up or down the value vector of these heads, we can control the likelihood of using the in-context answer on new data.

counterfactual

Circuit Component Reuse Across Tasks in Transformer Language Models

1 code implementation12 Oct 2023 Jack Merullo, Carsten Eickhoff, Ellie Pavlick

that it is mostly reused to solve a seemingly different task: Colored Objects (Ippolito & Callison-Burch, 2023).

Linearly Mapping from Image to Text Space

2 code implementations30 Sep 2022 Jack Merullo, Louis Castricato, Carsten Eickhoff, Ellie Pavlick

Prior work has shown that pretrained LMs can be taught to caption images when a vision model's parameters are optimized to encode images in the language space.

Image Captioning Image to text +3

Pretraining on Interactions for Learning Grounded Affordance Representations

1 code implementation *SEM (NAACL) 2022 Jack Merullo, Dylan Ebert, Carsten Eickhoff, Ellie Pavlick

Lexical semantics and cognitive science point to affordances (i. e. the actions that objects support) as critical for understanding and representing nouns and verbs.

Grounded language learning

Investigating Sports Commentator Bias within a Large Corpus of American Football Broadcasts

1 code implementation IJCNLP 2019 Jack Merullo, Luke Yeh, Abram Handler, Alvin Grissom II, Brendan O'Connor, Mohit Iyyer

Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes.

Cannot find the paper you are looking for? You can Submit a new open access paper.