Directional Convergence Analysis under Spherically Symmetric Distribution

We consider the fundamental problem of learning linear predictors (i. e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent.

Meta-Regularization: An Approach to Adaptive Choice of the Learning Rate in Gradient Descent

We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods.

On the Landscape of Sparse Linear Networks

Network pruning, or sparse network has a long history and practical significance in modern applications.

On the Landscape of One-hidden-layer Sparse Networks and Beyond

We show that sparse linear networks can have spurious strict minima, which is in sharp contrast to dense linear networks which do not even have spurious minima.

Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond

Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers for representing weight parameters and activations, and are often used in real-world applications due to their saving of computation resources and reproducibility of results.

Towards Understanding the Importance of Noise in Training Neural Networks

Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.

Towards Better Generalization: BP-SVRG in Training Deep Neural Networks

Stochastic variance-reduced gradient (SVRG) is a classical optimization method.

Hyper-Regularization: An Adaptive Choice for the Learning Rate in Gradient Descent

Specifically, we impose a regularization term on the learning rate via a generalized distance, and cast the joint updating process of the parameter and the learning rate into a maxmin problem.

