no code implementations • 19 Jan 2024 • Xuekai Zhu, Yao Fu, BoWen Zhou, Zhouhan Lin
We formalize the phase transition under the grokking configuration into the Data Efficiency Hypothesis and identify data insufficiency, sufficiency, and surplus regimes in language models training dynamics.
no code implementations • 24 Oct 2023 • Kaiyan Zhang, Ning Ding, Biqing Qi, Xuekai Zhu, Xinwei Long, BoWen Zhou
Instruction tuning has recently been recognized as an effective way of aligning Large Language Models (LLMs) to enhance their generalization ability across various tasks.
1 code implementation • 23 May 2023 • Xuekai Zhu, Biqing Qi, Kaiyan Zhang, Xinwei Long, Zhouhan Lin, BoWen Zhou
While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment.
1 code implementation • 29 Aug 2022 • Xuekai Zhu, Jian Guan, Minlie Huang, Juan Liu
Moreover, to enhance content preservation, we design a mask-and-fill framework to explicitly fuse style-specific keywords of source texts into generation.