Search Results for author: William Merrill

Found 26 papers, 13 papers with code

The Illusion of State in State-Space Models

no code implementations12 Apr 2024 William Merrill, Jackson Petty, Ashish Sabharwal

Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$.

Transformers as Recognizers of Formal Languages: A Survey on Expressivity

no code implementations1 Nov 2023 Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin

As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages.

The Expressive Power of Transformers with Chain of Thought

no code implementations11 Oct 2023 William Merrill, Ashish Sabharwal

Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer?

How Language Model Hallucinations Can Snowball

1 code implementation22 May 2023 Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith

A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.

Hallucination Language Modelling +1

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

1 code implementation21 Mar 2023 William Merrill, Nikolaos Tsilivis, Aman Shukla

Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly.

Transparency Helps Reveal When Language Models Learn Meaning

1 code implementation14 Oct 2022 Zhaofeng Wu, William Merrill, Hao Peng, Iz Beltagy, Noah A. Smith

Many current NLP systems are built from language models trained to optimize unsupervised objectives on large amounts of raw text.

The Parallelism Tradeoff: Limitations of Log-Precision Transformers

no code implementations2 Jul 2022 William Merrill, Ashish Sabharwal

Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question.

Open-Ended Question Answering

ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension

2 code implementations ACL 2022 Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach

Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.

Image Classification Referring Expression +1

Extracting Finite Automata from RNNs Using State Merging

no code implementations28 Jan 2022 William Merrill, Nikolaos Tsilivis

One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior.

Saturated Transformers are Constant-Depth Threshold Circuits

no code implementations30 Jun 2021 William Merrill, Ashish Sabharwal, Noah A. Smith

Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages.

Hard Attention

Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?

no code implementations22 Apr 2021 William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith

We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.

Competency Problems: On Finding and Removing Artifacts in Language Data

no code implementations EMNLP 2021 Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith

In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.

Negation

Formal Language Theory Meets Modern NLP

no code implementations19 Feb 2021 William Merrill

NLP is deeply intertwined with the formal study of language, both conceptually and historically.

Effects of Parameter Norm Growth During Transformer Training: Inductive Bias from Gradient Descent

1 code implementation EMNLP 2021 William Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith

To better understand this bias, we study the tendency for transformer parameters to grow in magnitude ($\ell_2$ norm) during training, and its implications for the emergent representations within self attention layers.

Inductive Bias

A Formal Hierarchy of RNN Architectures

no code implementations ACL 2020 William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav

While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.

On the Linguistic Capacity of Real-Time Counter Automata

no code implementations ICLR 2020 William Merrill

This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.

Finding Hierarchical Structure in Neural Stacks Using Unsupervised Parsing

1 code implementation WS 2019 William Merrill, Lenny Khazan, Noah Amsel, Yiding Hao, Simon Mendelsohn, Robert Frank

Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities.

Language Modelling

Detecting Syntactic Change Using a Neural Part-of-Speech Tagger

1 code implementation WS 2019 William Merrill, Gigi Felice Stark, Robert Frank

Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.

Finding Syntactic Representations in Neural Stacks

1 code implementation4 Jun 2019 William Merrill, Lenny Khazan, Noah Amsel, Yiding Hao, Simon Mendelsohn, Robert Frank

Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities.

General Classification Language Modelling

Sequential Neural Networks as Automata

no code implementations WS 2019 William Merrill

This work attempts to explain the types of computation that neural networks can perform by relating them to automata.

End-to-end Graph-based TAG Parsing with Neural Networks

1 code implementation NAACL 2018 Jungo Kasai, Robert Frank, Pauli Xu, William Merrill, Owen Rambow

We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs.

POS POS Tagging +1

Cannot find the paper you are looking for? You can Submit a new open access paper.