no code implementations • 12 Apr 2024 • William Merrill, Jackson Petty, Ashish Sabharwal
Our analysis reveals that the expressive power of SSMs is limited very similarly to transformers: SSMs cannot express computation outside the complexity class $\mathsf{TC}^0$.
no code implementations • 21 Feb 2024 • William Merrill, Zhaofeng Wu, Norihito Naka, Yoon Kim, Tal Linzen
Do LMs infer the semantics of text from co-occurrence patterns in their training data?
2 code implementations • 1 Feb 2024 • Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam, Matthew E. Peters, Valentina Pyatkin, Abhilasha Ravichander, Dustin Schwenk, Saurabh Shah, Will Smith, Emma Strubell, Nishant Subramani, Mitchell Wortsman, Pradeep Dasigi, Nathan Lambert, Kyle Richardson, Luke Zettlemoyer, Jesse Dodge, Kyle Lo, Luca Soldaini, Noah A. Smith, Hannaneh Hajishirzi
Given the importance of these details in scientifically studying these models, including their biases and potential risks, we believe it is essential for the research community to have access to powerful, truly open LMs.
no code implementations • 1 Nov 2023 • Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin
As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages.
no code implementations • 11 Oct 2023 • William Merrill, Ashish Sabharwal
Motivated by this, we ask: Does such intermediate generation fundamentally extend the computational power of a decoder-only transformer?
1 code implementation • 22 May 2023 • Muru Zhang, Ofir Press, William Merrill, Alisa Liu, Noah A. Smith
A major risk of using language models in practical applications is their tendency to hallucinate incorrect statements.
1 code implementation • 21 Mar 2023 • William Merrill, Nikolaos Tsilivis, Aman Shukla
Grokking is a phenomenon where a model trained on an algorithmic task first overfits but, then, after a large amount of additional training, undergoes a phase transition to generalize perfectly.
1 code implementation • 14 Oct 2022 • Zhaofeng Wu, William Merrill, Hao Peng, Iz Beltagy, Noah A. Smith
Many current NLP systems are built from language models trained to optimize unsupervised objectives on large amounts of raw text.
1 code implementation • 26 Sep 2022 • William Merrill, Alex Warstadt, Tal Linzen
Language models are often trained on text alone, without additional grounding.
no code implementations • 2 Jul 2022 • William Merrill, Ashish Sabharwal
Despite their omnipresence in modern NLP, characterizing the computational power of transformer neural nets remains an interesting open question.
2 code implementations • ACL 2022 • Sanjay Subramanian, William Merrill, Trevor Darrell, Matt Gardner, Sameer Singh, Anna Rohrbach
Training a referring expression comprehension (ReC) model for a new visual domain requires collecting referring expressions, and potentially corresponding bounding boxes, for images in the domain.
no code implementations • 28 Jan 2022 • William Merrill, Nikolaos Tsilivis
One way to interpret the behavior of a blackbox recurrent neural network (RNN) is to extract from it a more interpretable discrete computational model, like a finite state machine, that captures its behavior.
no code implementations • 30 Jun 2021 • William Merrill, Ashish Sabharwal, Noah A. Smith
Transformers have become a standard neural network architecture for many NLP problems, motivating theoretical analysis of their power in terms of formal languages.
no code implementations • 22 Apr 2021 • William Merrill, Yoav Goldberg, Roy Schwartz, Noah A. Smith
We study whether assertions enable a system to emulate representations preserving semantic relations like equivalence.
no code implementations • EMNLP 2021 • Matt Gardner, William Merrill, Jesse Dodge, Matthew E. Peters, Alexis Ross, Sameer Singh, Noah A. Smith
In this work we argue that for complex language understanding tasks, all simple feature correlations are spurious, and we formalize this notion into a class of problems which we call competency problems.
no code implementations • 19 Feb 2021 • William Merrill
NLP is deeply intertwined with the formal study of language, both conceptually and historically.
1 code implementation • EMNLP 2021 • William Merrill, Vivek Ramanujan, Yoav Goldberg, Roy Schwartz, Noah Smith
To better understand this bias, we study the tendency for transformer parameters to grow in magnitude ($\ell_2$ norm) during training, and its implications for the emergent representations within self attention layers.
4 code implementations • ACL 2020 • Lucy Lu Wang, Kyle Lo, Yoganand Chandrasekhar, Russell Reas, Jiangjiang Yang, Doug Burdick, Darrin Eide, Kathryn Funk, Yannis Katsis, Rodney Kinney, Yunyao Li, Ziyang Liu, William Merrill, Paul Mooney, Dewey Murdick, Devvret Rishi, Jerry Sheehan, Zhihong Shen, Brandon Stilson, Alex Wade, Kuansan Wang, Nancy Xin Ru Wang, Chris Wilhelm, Boya Xie, Douglas Raymond, Daniel S. Weld, Oren Etzioni, Sebastian Kohlmeier
The COVID-19 Open Research Dataset (CORD-19) is a growing resource of scientific papers on COVID-19 and related historical coronavirus research.
no code implementations • ACL 2020 • William Merrill, Gail Weiss, Yoav Goldberg, Roy Schwartz, Noah A. Smith, Eran Yahav
While formally extending these findings to unsaturated RNNs is left to future work, we hypothesize that the practical learnable capacity of unsaturated RNNs obeys a similar hierarchy.
no code implementations • ICLR 2020 • William Merrill
This work makes general contributions to the theory of formal languages that are of potential interest for understanding recurrent neural networks.
1 code implementation • WS 2019 • William Merrill, Lenny Khazan, Noah Amsel, Yiding Hao, Simon Mendelsohn, Robert Frank
Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities.
1 code implementation • WS 2019 • William Merrill, Gigi Felice Stark, Robert Frank
Thus, our results suggest that our tagger is implicitly learning to model syntactic change in American English over the course of the 19th, 20th, and early 21st centuries.
1 code implementation • 4 Jun 2019 • William Merrill, Lenny Khazan, Noah Amsel, Yiding Hao, Simon Mendelsohn, Robert Frank
Neural network architectures have been augmented with differentiable stacks in order to introduce a bias toward learning hierarchy-sensitive regularities.
no code implementations • WS 2019 • William Merrill
This work attempts to explain the types of computation that neural networks can perform by relating them to automata.
2 code implementations • WS 2018 • Yiding Hao, William Merrill, Dana Angluin, Robert Frank, Noah Amsel, Andrew Benz, Simon Mendelsohn
This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models.
1 code implementation • NAACL 2018 • Jungo Kasai, Robert Frank, Pauli Xu, William Merrill, Owen Rambow
We present a graph-based Tree Adjoining Grammar (TAG) parser that uses BiLSTMs, highway connections, and character-level CNNs.