Search Results for author: Alexander Yom Din

Found 1 papers, 1 papers with code

Jump to Conclusions: Short-Cutting Transformers With Linear Transformations

2 code implementations • 16 Mar 2023 • Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva

Moreover, in the context of language modeling, our method allows "peeking" into early layer representations of GPT-2 and BERT, showing that often LMs already predict the final output in early layers.

Decision Making Language Modelling

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.