no code implementations • 1 Jan 2023 • Hongru Yang, Yingbin Liang, Xiaojie Guo, Lingfei Wu, Zhangyang Wang
It is shown that as long as the pruning fraction is below a certain threshold, gradient descent can drive the training loss toward zero and the network exhibits good generalization performance.
no code implementations • 1 Jan 2023 • Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Zhangyang Wang, Yingbin Liang
This work studies training one-hidden-layer overparameterized ReLU networks via gradient descent in the neural tangent kernel (NTK) regime, where the networks' biases are initialized to some constant rather than zero.
no code implementations • 27 Mar 2022 • Hongru Yang, Zhangyang Wang
It is shown that given a pruning probability, for fully-connected neural networks with the weights randomly pruned at the initialization, as the width of each layer grows to infinity sequentially, the NTK of the pruned neural network converges to the limiting NTK of the original network with some extra scaling.