no code implementations • 17 May 2022 • Dachao Lin, Zhihua Zhang
In this short note, we give the convergence analysis of the policy in the recent famous policy mirror descent (PMD).
no code implementations • 8 Jan 2022 • Kun Chen, Dachao Lin, Zhihua Zhang
In this paper, we follow Eftekhari's work to give a non-local convergence analysis of deep linear networks.
no code implementations • NeurIPS 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
In this paper, we study gradient methods for training deep linear neural networks with binary cross-entropy loss.
no code implementations • NeurIPS 2021 • Dachao Lin, Haishan Ye, Zhihua Zhang
In this paper, we follow Rodomanov and Nesterov’s work to study quasi-Newton methods.
no code implementations • 9 May 2021 • Dachao Lin, Zhihua Zhang
We consider the fundamental problem of learning linear predictors (i. e., separable datasets with zero margin) using neural networks with gradient flow or gradient descent.
no code implementations • 12 Apr 2021 • Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang
We propose \textit{Meta-Regularization}, a novel approach for the adaptive choice of the learning rate in first-order gradient descent methods.
no code implementations • 1 Jan 2021 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
Network pruning, or sparse network has a long history and practical significance in modern applications.
no code implementations • 16 Sep 2020 • Dachao Lin, Ruoyu Sun, Zhihua Zhang
We show that linear networks can have no spurious valleys under special sparse structures, and non-linear networks could also admit no spurious valleys under a wide final layer.
no code implementations • 30 Aug 2020 • Dachao Lin, Peiqin Sun, Guangzeng Xie, Shuchang Zhou, Zhihua Zhang
Quantized Neural Networks (QNNs) use low bit-width fixed-point numbers for representing weight parameters and activations, and are often used in real-world applications due to their saving of computation resources and reproducibility of results.
no code implementations • 7 Sep 2019 • Mo Zhou, Tianyi Liu, Yan Li, Dachao Lin, Enlu Zhou, Tuo Zhao
Numerous empirical evidence has corroborated that the noise plays a crucial rule in effective and efficient training of neural networks.
no code implementations • 18 Aug 2019 • Hao Jin, Dachao Lin, Zhihua Zhang
Stochastic variance-reduced gradient (SVRG) is a classical optimization method.
no code implementations • ICLR 2019 • Guangzeng Xie, Hao Jin, Dachao Lin, Zhihua Zhang
Specifically, we impose a regularization term on the learning rate via a generalized distance, and cast the joint updating process of the parameter and the learning rate into a maxmin problem.