Search Results for author: Libin Zhu

Found 8 papers, 1 papers with code

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning

no code implementations7 Jun 2023 Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

In this paper, we first present an explanation regarding the common occurrence of spikes in the training loss when neural networks are trained with stochastic gradient descent (SGD).

Restricted Strong Convexity of Deep Learning Models with Smooth Activations

no code implementations29 Sep 2022 Arindam Banerjee, Pedro Cisneros-Velarde, Libin Zhu, Mikhail Belkin

Second, we introduce a new analysis of optimization based on Restricted Strong Convexity (RSC) which holds as long as the squared norm of the average gradient of predictors is $\Omega(\frac{\text{poly}(L)}{\sqrt{m}})$ for the square loss.

A note on Linear Bottleneck networks and their Transition to Multilinearity

no code implementations30 Jun 2022 Libin Zhu, Parthe Pandit, Mikhail Belkin

In this work we show that linear networks with a bottleneck layer learn bilinear functions of the weights, in a ball of radius $O(1)$ around initialization.

Quadratic models for understanding neural network dynamics

1 code implementation24 May 2022 Libin Zhu, Chaoyue Liu, Adityanarayanan Radhakrishnan, Mikhail Belkin

While neural networks can be approximated by linear models as their width increases, certain properties of wide neural networks cannot be captured by linear models.

Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

no code implementations24 May 2022 Libin Zhu, Chaoyue Liu, Mikhail Belkin

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity.

Transition to Linearity of Wide Neural Networks is an Emerging Property of Assembling Weak Models

no code implementations ICLR 2022 Chaoyue Liu, Libin Zhu, Mikhail Belkin

Wide neural networks with linear output layer have been shown to be near-linear, and to have near-constant neural tangent kernel (NTK), in a region containing the optimization path of gradient descent.

On the linearity of large non-linear models: when and why the tangent kernel is constant

no code implementations NeurIPS 2020 Chaoyue Liu, Libin Zhu, Mikhail Belkin

We show that the transition to linearity of the model and, equivalently, constancy of the (neural) tangent kernel (NTK) result from the scaling properties of the norm of the Hessian matrix of the network as a function of the network width.

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

no code implementations29 Feb 2020 Chaoyue Liu, Libin Zhu, Mikhail Belkin

The success of deep learning is due, to a large extent, to the remarkable effectiveness of gradient-based optimization methods applied to large neural networks.

Cannot find the paper you are looking for? You can Submit a new open access paper.