no code implementations • 28 Jul 2020 • Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li
Empirical results on deep learning verify that when adopting the same large batch size, SNGM can achieve better test accuracy than MSGD and other state-of-the-art large-batch training methods.
no code implementations • 26 Feb 2020 • Shen-Yi Zhao, Yin-Peng Xie, Wu-Jun Li
We theoretically prove that, compared to classical stagewise SGD which decreases learning rate by stage, \mbox{SEBS} can reduce the number of parameter updates without increasing generalization error.
no code implementations • 11 Jun 2019 • Shen-Yi Zhao, Hao Gao, Wu-Jun Li
However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of training is the same as the size of the full training set.
no code implementations • 6 Jun 2019 • Xiao Ma, Shen-Yi Zhao, Wu-Jun Li
Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards.
no code implementations • 30 May 2019 • Chang-Wei Shi, Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li
With the rapid growth of data, distributed momentum stochastic gradient descent~(DMSGD) has been widely used in distributed learning, especially for training large-scale deep models.
no code implementations • 30 May 2019 • Shen-Yi Zhao, Hao Gao, Wu-Jun Li
Using the transformation equation, we propose the convergence rate of stagewise M-DSGD which bridges the gap between theory and practice.
no code implementations • 10 Jan 2019 • Shen-Yi Zhao, Hao Gao, Wu-Jun Li
Due to its efficiency and ease to implement, stochastic gradient descent (SGD) has been widely used in machine learning.
no code implementations • NeurIPS 2018 • Shen-Yi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li
Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough.
no code implementations • 15 Mar 2018 • Shen-Yi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li
Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough.
no code implementations • 10 Feb 2018 • Gong-Duo Zhang, Shen-Yi Zhao, Hao Gao, Wu-Jun Li
Linear classification has been widely used in many high-dimensional applications like text classification.
no code implementations • 11 Dec 2016 • Shen-Yi Zhao, Gong-Duo Zhang, Wu-Jun Li
and AsySVRG, for non-convex problems.
1 code implementation • 30 Jan 2016 • Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li
Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods.
no code implementations • 24 Aug 2015 • Shen-Yi Zhao, Wu-Jun Li
Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness.
no code implementations • 12 Feb 2015 • Shen-Yi Zhao, Wu-Jun Li, Zhi-Hua Zhou
There exists only one stochastic method, called SA-ADMM, which can achieve convergence rate $O(1/T)$ on general convex problems.