Search Results for author: Alexander Yom Din

Found 1 papers, 1 papers with code

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

2 code implementations16 Mar 2023 Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva

Moreover, in the context of language modeling, our method allows "peeking" into early layer representations of GPT-2 and BERT, showing that often LMs already predict the final output in early layers.

Decision Making Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.