no code implementations • 16 Jan 2024 • Zhongwang Zhang, Zhiwei Wang, Junjie Yao, Zhangchen Zhou, Xiaolong Li, Weinan E, Zhi-Qin John Xu
However, language model research faces significant challenges, especially for academic research groups with constrained resources.
no code implementations • 18 Jul 2023 • Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu
We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models.
no code implementations • 25 May 2023 • Zhongwang Zhang, Yuqing Li, Tao Luo, Zhi-Qin John Xu
In order to investigate the underlying mechanism by which dropout facilitates the identification of flatter minima, we study the noise structure of the derived stochastic modified equation for dropout.
no code implementations • 20 May 2023 • Zhongwang Zhang, Zhi-Qin John Xu
In this work, we study the mechanism underlying loss spikes observed during neural network training.
no code implementations • 21 Nov 2022 • Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu
By these results, model rank of a target function predicts a minimal training data size for its successful recovery.
no code implementations • 13 Jul 2022 • Zhongwang Zhang, Zhi-Qin John Xu
Secondly, we experimentally find that the training with dropout leads to the neural network with a flatter minimum compared with standard gradient descent training, and the implicit regularization is the key to finding flat solutions.
no code implementations • 30 Nov 2021 • Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu
We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i. e., loss landscape of an NN contains all critical points of all the narrower NNs.
no code implementations • 1 Nov 2021 • Zhongwang Zhang, Hanxu Zhou, Zhi-Qin John Xu
It is important to understand how the popular regularization method dropout helps the neural network training find a good generalization solution.
no code implementations • NeurIPS 2021 • Yaoyu Zhang, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu
Understanding the structure of loss landscape of deep neural networks (DNNs)is obviously important.