1 code implementation • 15 Jan 2025 • Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu
Transformers have demonstrated impressive capabilities across various tasks, yet their performance on compositional problems remains a subject of debate.
no code implementations • 3 Aug 2024 • Jing Yan, Yunxuan Feng, Wei Dai, Yaoyu Zhang
In this paper, we probe how orientation-selective neurons organized on a 1-D ring network respond to perturbations in the hope of gaining some insights on the robustness of visual system in brain.
no code implementations • 26 Jun 2024 • Yaoyu Zhang, Leyang Zhang, Zhongwang Zhang, Zhiwei Bai
Determining whether deep neural network (DNN) models can reliably recover target functions at overparameterization is a critical yet complex issue in the theory of deep learning.
no code implementations • 26 May 2024 • Leyang Zhang, Yaoyu Zhang, Tao Luo
This paper presents a comprehensive analysis of critical point sets in two-layer neural networks.
no code implementations • 24 May 2024 • Zhangchen Zhou, Yaoyu Zhang, Zhi-Qin John Xu
Grokking is the phenomenon where neural networks NNs initially fit the training data and later generalize to the test data during training.
no code implementations • 24 May 2024 • Zhiwei Wang, Yunji Wang, Zhongwang Zhang, Zhangchen Zhou, Hui Jin, Tianyang Hu, Jiacheng Sun, Zhenguo Li, Yaoyu Zhang, Zhi-Qin John Xu
Large language models have consistently struggled with complex reasoning tasks, such as mathematical problem-solving.
no code implementations • 22 May 2024 • Zhiwei Bai, Jiajie Zhao, Yaoyu Zhang
Our work reveals the intricate interplay between data connectivity, training dynamics, and implicit regularization in matrix factorization models.
no code implementations • 22 May 2024 • Jiajie Zhao, Zhiwei Bai, Yaoyu Zhang
Additionally, we empirically delineate two critical thresholds in sample size--termed the "optimistic sample size" and the "separation sample size"--that align with the theoretical frameworks established by (see arXiv:2307. 08921 and arXiv:2309. 00508).
1 code implementation • 8 May 2024 • Zhongwang Zhang, Pengxiao Lin, Zhiwei Wang, Yaoyu Zhang, Zhi-Qin John Xu
We discover that the parameter initialization scale plays a critical role in determining whether the model learns inferential (reasoning-based) solutions, which capture the underlying compositional primitives, or symmetric (memory-based) solutions, which simply memorize mappings without understanding the compositional structure.
no code implementations • 1 Sep 2023 • Leyang Zhang, Yaoyu Zhang, Tao Luo
Under mild assumptions, we investigate the geometry of the loss landscape for two-layer neural networks in the vicinity of global minima.
no code implementations • 18 Jul 2023 • Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu
We propose an optimistic estimate to evaluate the best possible fitting performance of nonlinear models.
no code implementations • 21 Nov 2022 • Yaoyu Zhang, Zhongwang Zhang, Leyang Zhang, Zhiwei Bai, Tao Luo, Zhi-Qin John Xu
By these results, model rank of a target function predicts a minimal training data size for its successful recovery.
no code implementations • 26 May 2022 • Zhiwei Bai, Tao Luo, Zhi-Qin John Xu, Yaoyu Zhang
Regarding the easy training of deep networks, we show that local minimum of an NN can be lifted to strict saddle points of a deeper NN.
no code implementations • 24 May 2022 • Hanxu Zhou, Qixuan Zhou, Zhenyuan Jin, Tao Luo, Yaoyu Zhang, Zhi-Qin John Xu
Through experiments under three-layer condition, our phase diagram suggests a complicated dynamical regimes consisting of three possible regimes, together with their mixture, for deep NNs and provides a guidance for studying deep NNs in different initialization regimes, which reveals the possibility of completely different dynamics emerging within a deep NN for its different layers.
no code implementations • 28 Jan 2022 • Leyang Zhang, Zhi-Qin John Xu, Tao Luo, Yaoyu Zhang
In recent years, understanding the implicit regularization of neural networks (NNs) has become a central task in deep learning theory.
no code implementations • 19 Jan 2022 • Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo
Understanding deep learning is increasingly emergent as it penetrates more and more into industry and science.
no code implementations • 9 Jan 2022 • Tianhan Zhang, Yuxiao Yi, Yifan Xu, Zhi X. Chen, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu
The current work aims to understand two basic questions regarding the deep neural network (DNN) method: what data the DNN needs and how general the DNN method can be.
no code implementations • 6 Jan 2022 • Zhiwei Wang, Yaoyu Zhang, Enhan Zhao, Yiguang Ju, Weinan E, Zhi-Qin John Xu, Tianhan Zhang
The mechanism reduction is modeled as an optimization problem on Boolean space, where a Boolean vector, each entry corresponding to a species, represents a reduced mechanism.
no code implementations • 30 Nov 2021 • Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu
We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i. e., loss landscape of an NN contains all critical points of all the narrower NNs.
no code implementations • 17 Jul 2021 • Lulu Zhang, Zhi-Qin John Xu, Yaoyu Zhang
Complex design problems are common in the scientific and industrial fields.
no code implementations • 8 Jul 2021 • Lulu Zhang, Tao Luo, Yaoyu Zhang, Weinan E, Zhi-Qin John Xu, Zheng Ma
In this paper, we propose a a machine learning approach via model-operator-data network (MOD-Net) for solving PDEs.
no code implementations • NeurIPS 2021 • Yaoyu Zhang, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu
Understanding the structure of loss landscape of deep neural networks (DNNs)is obviously important.
no code implementations • 25 May 2021 • Tao Luo, Zheng Ma, Zhiwei Wang, Zhi-Qin John Xu, Yaoyu Zhang
frequency in DNN training.
no code implementations • 25 May 2021 • Hanxu Zhou, Qixuan Zhou, Tao Luo, Yaoyu Zhang, Zhi-Qin John Xu
Our theoretical analysis confirms experiments for two cases, one is for the activation function of multiplicity one with arbitrary dimension input, which contains many common activation functions, and the other is for the layer with one-dimensional input and arbitrary multiplicity.
no code implementations • 30 Jan 2021 • Yaoyu Zhang, Tao Luo, Zheng Ma, Zhi-Qin John Xu
Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question.
no code implementations • 6 Dec 2020 • Tao Luo, Zheng Ma, Zhiwei Wang, Zhi-Qin John Xu, Yaoyu Zhang
A supervised learning problem is to find a function in a hypothesis function space given values on isolated data points.
no code implementations • 24 Nov 2020 • Tianhan Zhang, Yaoyu Zhang, Weinan E, Yiguang Ju
Besides, the ignition delay time differences are within 1%.
1 code implementation • 15 Oct 2020 • Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang
Recent works show an intriguing phenomenon of Frequency Principle (F-Principle) that deep neural networks (DNNs) fit the target function from low to high frequency during the training, which provides insight into the training and generalization behavior of DNNs in complex tasks.
1 code implementation • 15 Jul 2020 • Tao Luo, Zhi-Qin John Xu, Zheng Ma, Yaoyu Zhang
In this work, inspired by the phase diagram in statistical mechanics, we draw the phase diagram for the two-layer ReLU neural network at the infinite-width limit for a complete characterization of its dynamical regimes and their dependence on hyperparameters related to initialization.
no code implementations • 6 Dec 2019 • Zhi-Qin John Xu, Jiwei Zhang, Yaoyu Zhang, Chengchao Zhao
We first estimate \emph{a priori} generalization error of finite-width two-layer ReLU NN with constraint of minimal norm solution, which is proved by \cite{zhang2019type} to be an equivalent solution of a linearized (w. r. t.
1 code implementation • 21 Jun 2019 • Tao Luo, Zheng Ma, Zhi-Qin John Xu, Yaoyu Zhang
Along with fruitful applications of Deep Neural Networks (DNNs) to realistic problems, recently, some empirical studies of DNNs reported a universal phenomenon of Frequency Principle (F-Principle): a DNN tends to learn a target function from low to high frequencies during the training.
1 code implementation • 24 May 2019 • Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma
It remains a puzzle that why deep neural networks (DNNs), with more parameters than samples, often generalize well.
no code implementations • 19 May 2019 • Yaoyu Zhang, Zhi-Qin John Xu, Tao Luo, Zheng Ma
Overall, our work serves as a baseline for the further investigation of the impact of initialization and loss function on the generalization of DNNs, which can potentially guide and improve the training of DNNs in practice.
3 code implementations • 19 Jan 2019 • Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo, Yanyang Xiao, Zheng Ma
We study the training process of Deep Neural Networks (DNNs) from the Fourier analysis perspective.
1 code implementation • 3 Jul 2018 • Zhi-Qin John Xu, Yaoyu Zhang, Yanyang Xiao
Why deep neural networks (DNNs) capable of overfitting often generalize well in practice is a mystery [#zhang2016understanding].