1 code implementation • 24 May 2023 • Jikun Kang, Romain Laroche, Xindi Yuan, Adam Trischler, Xue Liu, Jie Fu
We argue that this inefficiency stems from the forgetting phenomenon, in which a model memorizes its behaviors in parameters throughout training.