2 code implementations • 16 Mar 2023 • Alexander Yom Din, Taelin Karidi, Leshem Choshen, Mor Geva
Moreover, in the context of language modeling, our method allows "peeking" into early layer representations of GPT-2 and BERT, showing that often LMs already predict the final output in early layers.