1 code implementation • EMNLP (sustainlp) 2021 • Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonathan Berant
Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length.
1 code implementation • 13 Nov 2023 • Gilad Deutch, Nadav Magar, Tomer Bar Natan, Guy Dar
Next, we explore a major discrepancy in the flow of information throughout the model between ICL and GD, which we term Layer Causality.
1 code implementation • 6 Sep 2022 • Guy Dar, Mor Geva, Ankit Gupta, Jonathan Berant
In this work, we present a theoretical analysis where all parameters of a trained Transformer are interpreted by projecting them into the embedding space, that is, the space of vocabulary items they operate on.
1 code implementation • 26 Apr 2022 • Mor Geva, Avi Caciularu, Guy Dar, Paul Roit, Shoval Sadde, Micah Shlain, Bar Tamir, Yoav Goldberg
The opaque nature and unexplained behavior of transformer-based language models (LMs) have spurred a wide interest in interpreting their predictions.
1 code implementation • 13 Jun 2021 • Ankit Gupta, Guy Dar, Shaya Goodman, David Ciprut, Jonathan Berant
Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length.