Search Results for author: William Merrill

Found 26 papers, 13 papers with code

The Illusion of State in State-Space Models

no code implementations • 12 Apr 2024 • William Merrill, Jackson Petty, Ashish Sabharwal

Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$.

Paper
Add Code

Can You Learn Semantics Through Next-Word Prediction? The Case of Entailment

no code implementations • 21 Feb 2024 • William Merrill, Zhaofeng Wu, Norihito Naka, Yoon Kim, Tal Linzen

Do LMs infer the semantics of text from co-occurrence patterns in their training data?

Sentence

Paper
Add Code

OLMo: Accelerating the Science of Language Models

2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi

Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.

Language Modelling

3,909

Paper
Code

Transformers as Recognizers of Formal Languages: A Survey on Expressivity

no code implementations • 1 Nov 2023 • Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages.

Paper
Add Code

The Expressive Power of Transformers with Chain of Thought

no code implementations • 11 Oct 2023 • William Merrill, Ashish Sabharwal

Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer?

Paper
Add Code

How Language Model Hallucinations Can Snowball

1 code implementation • 22 May 2023 • Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith

A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.

Hallucination Language Modelling +1

Paper
Code

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

1 code implementation • 21 Mar 2023 • William Merrill, Nikolaos Tsilivis, Aman Shukla

Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly.

Paper
Code

Transparency Helps Reveal When Language Models Learn Meaning

1 code implementation • 14 Oct 2022 • Zhaofeng Wu, William Merrill, Hao Peng, Iz Beltagy, Noah A. Smith

Many current NLP systems are built from language models trained to optimize unsupervised objectives on large amounts of raw text.

Paper
Code

Entailment Semantics Can Be Extracted from an Ideal Language Model

1 code implementation • 26 Sep 2022 • William Merrill, Alex Warstadt, Tal Linzen

Language models are often trained on text alone, without additional grounding.

Language Modelling

Paper
Code

The Parallelism Tradeoff: Limitations of Log-Precision Transformers

no code implementations • 2 Jul 2022 • William Merrill, Ashish Sabharwal

Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question.

Open-Ended Question Answering

Paper
Add Code

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

2 code implementations • ACL 2022 • Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.

Image Classification Referring Expression +1

Paper
Code

Extracting Finite Automata from RNNs Using State Merging

no code implementations • 28 Jan 2022 • William Merrill, Nikolaos Tsilivis

One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior.

Paper
Add Code

Saturated Transformers are Constant-Depth Threshold Circuits

no code implementations • 30 Jun 2021 • William Merrill, Ashish Sabharwal, Noah A. Smith

Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages.

Hard Attention

Paper
Add Code

Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?

no code implementations • 22 Apr 2021 • William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith

We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.

Paper
Add Code

Competency Problems: On Finding and Removing Artifacts in Language Data

no code implementations • EMNLP 2021 • Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.

Negation

Paper
Add Code

Formal Language Theory Meets Modern NLP

no code implementations • 19 Feb 2021 • William Merrill

NLP is deeply intertwined with the formal study of language, both conceptually and historically.

Paper
Add Code

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

1 code implementation • EMNLP 2021 • William Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith

To better understand this bias, we study the tendency for transformer parameters to grow in magnitude ($\ell_2$ norm) during training, and its implications for the emergent representations within self attention layers.

Inductive Bias

Paper
Code

CORD-19: The COVID-19 Open Research Dataset

4 code implementations • ACL 2020 • Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Kinney, Yunyao Li, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex Wade, Kuansan Wang, Nancy Xin Ru Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, Sebastian Kohlmeier

The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research.

Information Retrieval Management +1

147

Paper
Code

A Formal Hierarchy of RNN Architectures

no code implementations • ACL 2020 • William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav

While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.

Paper
Add Code

On the Linguistic Capacity of Real-Time Counter Automata

no code implementations • ICLR 2020 • William Merrill

This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.

Paper
Add Code

Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing

1 code implementation • WS 2019 • William Merrill, Lenny Khazan, Noah Amsel, Yiding Hao, Simon Mendelsohn, Robert Frank

Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities.

Language Modelling

Paper
Code

Detecting Syntactic Change Using a Neural Part-of-Speech Tagger

1 code implementation • WS 2019 • William Merrill, Gigi Felice Stark, Robert Frank

Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.

Paper
Code

Finding Syntactic Representations in Neural Stacks

1 code implementation • 4 Jun 2019 • William Merrill, Lenny Khazan, Noah Amsel, Yiding Hao, Simon Mendelsohn, Robert Frank

Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities.

General Classification Language Modelling

Paper
Code

Sequential Neural Networks as Automata

no code implementations • WS 2019 • William Merrill

This work attempts to explain the types of computation that neural networks can perform by relating them to automata.

Paper
Add Code

Context-Free Transductions with Neural Stacks

2 code implementations • WS 2018 • Yiding Hao, William Merrill, Dana Angluin, Robert Frank, Noah Amsel, Andrew Benz, Simon Mendelsohn

This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models.

Language Modelling

141

Paper
Code

End-to-end Graph-based TAG Parsing with Neural Networks

1 code implementation • NAACL 2018 • Jungo Kasai, Robert Frank, Pauli Xu, William Merrill, Owen Rambow

We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs.

POS POS Tagging +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.