Search Results for author: Shen-Yi Zhao

Found 14 papers, 1 papers with code

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

no code implementations • 28 Jul 2020 • Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li

Empirical results on deep learning verify that when adopting the same large batch size, SNGM can achieve better test accuracy than MSGD and other state-of-the-art large-batch training methods.

Paper
Add Code

Stagewise Enlargement of Batch Size for SGD-based Learning

no code implementations • 26 Feb 2020 • Shen-Yi Zhao, Yin-Peng Xie, Wu-Jun Li

We theoretically prove that, compared to classical stagewise SGD which decreases learning rate by stage, \mbox{SEBS} can reduce the number of parameter updates without increasing generalization error.

Paper
Add Code

ADASS: Adaptive Sample Selection for Training Acceleration

no code implementations • 11 Jun 2019 • Shen-Yi Zhao, Hao Gao, Wu-Jun Li

However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of training is the same as the size of the full training set.

Paper
Add Code

Clustered Reinforcement Learning

no code implementations • 6 Jun 2019 • Xiao Ma, Shen-Yi Zhao, Wu-Jun Li

Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards.

Atari Games Clustering +4

Paper
Add Code

Global Momentum Compression for Sparse Communication in Distributed Learning

no code implementations • 30 May 2019 • Chang-Wei Shi, Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li

With the rapid growth of data, distributed momentum stochastic gradient descent~(DMSGD) has been widely used in distributed learning, especially for training large-scale deep models.

Paper
Add Code

On the Convergence of Memory-Based Distributed SGD

no code implementations • 30 May 2019 • Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Using the transformation equation, we propose the convergence rate of stagewise M-DSGD which bridges the gap between theory and practice.

Paper
Add Code

Quantized Epoch-SGD for Communication-Efficient Distributed Learning

no code implementations • 10 Jan 2019 • Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Due to its efficiency and ease to implement, stochastic gradient descent (SGD) has been widely used in machine learning.

Quantization

Paper
Add Code

Proximal SCOPE for Distributed Sparse Learning

no code implementations • NeurIPS 2018 • Shen-Yi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li

Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough.

Sparse Learning

Paper
Add Code

Proximal SCOPE for Distributed Sparse Learning: Better Data Partition Implies Faster Convergence Rate

no code implementations • 15 Mar 2018 • Shen-Yi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li

Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough.

Sparse Learning

Paper
Add Code

Feature-Distributed SVRG for High-Dimensional Linear Classification

no code implementations • 10 Feb 2018 • Gong-Duo Zhang, Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Linear classification has been widely used in many high-dimensional applications like text classification.

General Classification text-classification +2

Paper
Add Code

Lock-Free Optimization for Non-Convex Problems

no code implementations • 11 Dec 2016 • Shen-Yi Zhao, Gong-Duo Zhang, Wu-Jun Li

and AsySVRG, for non-convex problems.

Paper
Add Code

SCOPE: Scalable Composite Optimization for Learning on Spark

1 code implementation • 30 Jan 2016 • Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li

Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods.

Stochastic Optimization

155

Paper
Code

Fast Asynchronous Parallel Stochastic Gradient Decent

no code implementations • 24 Aug 2015 • Shen-Yi Zhao, Wu-Jun Li

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness.

Paper
Add Code

Scalable Stochastic Alternating Direction Method of Multipliers

no code implementations • 12 Feb 2015 • Shen-Yi Zhao, Wu-Jun Li, Zhi-Hua Zhou

There exists only one stochastic method, called SA-ADMM, which can achieve convergence rate $O(1/T)$ on general convex problems.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.