1 code implementation • 18 Aug 2024 • Jiajun Song, Zhuoyan Xu, Yiqiao Zhong
We empirically examined the training dynamics of Transformers on a synthetic example and conducted extensive experiments on a variety of pretrained LLMs, focusing on a type of components known as induction heads.
1 code implementation • 22 Jul 2024 • Zhuoyan Xu, Zhenmei Shi, YIngyu Liang
In this study, we delve into the ICL capabilities of LLMs on composite tasks, with only simple tasks as in-context examples.
no code implementations • 30 May 2024 • Zhenmei Shi, Junyi Wei, Zhuoyan Xu, YIngyu Liang
This sheds light on where transformers pay attention to and how that affects ICL.
no code implementations • 8 May 2024 • YIngyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Zhuoyan Xu, Junze Yin
We then design a fast algorithm to approximate the attention matrix via a sum of such $k$ convolution matrices.
1 code implementation • 22 Feb 2024 • Zhuoyan Xu, Zhenmei Shi, Junyi Wei, Fangzhou Mu, Yin Li, YIngyu Liang
An emerging solution with recent success in vision and NLP involves finetuning a foundation model on a selection of relevant tasks, before its adaptation to a target task with limited labeled samples.
1 code implementation • 19 May 2022 • Zhuoyan Xu, Kris Sankaran
We illustrate the performance of our methods by spatial structure recovery and gene expression reconstruction in simulation.