no code implementations • 28 Mar 2025 • Lena Strobl, Dana Angluin, Robert Frank
While transformers have proven enormously successful in a range of tasks, their fundamental properties as models of computation are not well understood.
no code implementations • 13 Dec 2024 • Andy Yang, Lena Strobl, David Chiang, Dana Angluin
Second, we demonstrate how temperature scaling allows softmax transformers to simulate a large subclass of average-hard attention transformers, those that have what we call the uniform-tieless property.
no code implementations • 2 Apr 2024 • Lena Strobl, Dana Angluin, David Chiang, Jonathan Rawski, Ashish Sabharwal
B-RASP[pos] enables calculations on positions (such as copying the first half of a string) and contains all first-order regular functions.
no code implementations • 1 Nov 2023 • Lena Strobl, William Merrill, Gail Weiss, David Chiang, Dana Angluin
As transformers have gained prominence in natural language processing, some researchers have investigated theoretically what problems they can and cannot solve, by treating problems as formal languages.
no code implementations • 21 Oct 2023 • Andy Yang, David Chiang, Dana Angluin
The expressive power of transformers over inputs of unbounded size can be studied through their ability to recognize classes of formal languages.
no code implementations • 13 Apr 2022 • Yiding Hao, Dana Angluin, Robert Frank
This paper analyzes three formal models of Transformer encoders that differ in the form of their self-attention mechanism: unique hard attention (UHAT); generalized unique hard attention (GUHAT), which generalizes UHAT; and averaging hard attention (AHAT).
no code implementations • 10 Sep 2018 • Dana Angluin, Dana Fisman
The right congruence of a regular omega-language is not informative enough; many regular omega-languages have a trivial right congruence, and in general it is not always possible to define an omega-automaton recognizing a given language that is isomorphic to the rightcon automaton.
2 code implementations • WS 2018 • Yiding Hao, William Merrill, Dana Angluin, Robert Frank, Noah Amsel, Andrew Benz, Simon Mendelsohn
This paper analyzes the behavior of stack-augmented recurrent neural network (RNN) models.