no code implementations • 16 Jan 2024 • Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu
However, language model research faces significant challenges, especially for academic research groups with constrained resources.
no code implementations • 17 May 2023 • Zhangchen Zhou, Hanxu Zhou, Yuqing Li, Zhi-Qin John Xu
Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training.
no code implementations • 12 Mar 2023 • Zhengan Chen, Yuqing Li, Tao Luo, Zhangchen Zhou, Zhi-Qin John Xu
The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research.