Search Results for author: Libin Zhu

Found 8 papers, 1 papers with code

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

no code implementations • 7 Jun 2023 • Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD).

Paper
Add Code

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

no code implementations • 29 Sep 2022 • Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Mikhail Belkin

Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $\Omega(\frac{\text{poly}(L)}{\sqrt{m}})$ for the square loss.

Paper
Add Code

A note on Linear Bottleneck networks and their Transition to Multilinearity

no code implementations • 30 Jun 2022 • Libin Zhu, Parthe Pandit, Mikhail Belkin

In this work we show that linear networks with a bottleneck layer learn bilinear functions of the weights, in a ball of radius $O(1)$ around initialization.

Paper
Add Code

Quadratic models for understanding neural network dynamics

1 code implementation • 24 May 2022 • Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models.

Paper
Code

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

no code implementations • 24 May 2022 • Libin Zhu, Chaoyue Liu, Mikhail Belkin

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity.

Paper
Add Code

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

no code implementations • ICLR 2022 • Chaoyue Liu, Libin Zhu, Mikhail Belkin

Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent.

Paper
Add Code

On the linearity of large non-linear models: when and why the tangent kernel is constant

no code implementations • NeurIPS 2020 • Chaoyue Liu, Libin Zhu, Mikhail Belkin

We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width.

Paper
Add Code

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

no code implementations • 29 Feb 2020 • Chaoyue Liu, Libin Zhu, Mikhail Belkin

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.