Search Results for author: Shen-Yi Zhao

Found 14 papers, 1 papers with code

Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training

no code implementations28 Jul 2020 Shen-Yi Zhao, Chang-Wei Shi, Yin-Peng Xie, Wu-Jun Li

Empirical results on deep learning verify that when adopting the same large batch size, SNGM can achieve better test accuracy than MSGD and other state-of-the-art large-batch training methods.

Stagewise Enlargement of Batch Size for SGD-based Learning

no code implementations26 Feb 2020 Shen-Yi Zhao, Yin-Peng Xie, Wu-Jun Li

We theoretically prove that, compared to classical stagewise SGD which decreases learning rate by stage, \mbox{SEBS} can reduce the number of parameter updates without increasing generalization error.

ADASS: Adaptive Sample Selection for Training Acceleration

no code implementations11 Jun 2019 Shen-Yi Zhao, Hao Gao, Wu-Jun Li

However, in all existing SGD and its variants, the sample size in each iteration~(epoch) of training is the same as the size of the full training set.

Clustered Reinforcement Learning

no code implementations6 Jun 2019 Xiao Ma, Shen-Yi Zhao, Wu-Jun Li

Exploration strategy design is one of the challenging problems in reinforcement learning~(RL), especially when the environment contains a large state space or sparse rewards.

Atari Games Clustering +4

Global Momentum Compression for Sparse Communication in Distributed Learning

no code implementations30 May 2019 Chang-Wei Shi, Shen-Yi Zhao, Yin-Peng Xie, Hao Gao, Wu-Jun Li

With the rapid growth of data, distributed momentum stochastic gradient descent~(DMSGD) has been widely used in distributed learning, especially for training large-scale deep models.

On the Convergence of Memory-Based Distributed SGD

no code implementations30 May 2019 Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Using the transformation equation, we propose the convergence rate of stagewise M-DSGD which bridges the gap between theory and practice.

Quantized Epoch-SGD for Communication-Efficient Distributed Learning

no code implementations10 Jan 2019 Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Due to its efficiency and ease to implement, stochastic gradient descent (SGD) has been widely used in machine learning.

Quantization

Proximal SCOPE for Distributed Sparse Learning

no code implementations NeurIPS 2018 Shen-Yi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li

Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough.

Sparse Learning

Proximal SCOPE for Distributed Sparse Learning: Better Data Partition Implies Faster Convergence Rate

no code implementations15 Mar 2018 Shen-Yi Zhao, Gong-Duo Zhang, Ming-Wei Li, Wu-Jun Li

Based on the defined metric, we theoretically prove that pSCOPE is convergent with a linear convergence rate if the data partition is good enough.

Sparse Learning

Feature-Distributed SVRG for High-Dimensional Linear Classification

no code implementations10 Feb 2018 Gong-Duo Zhang, Shen-Yi Zhao, Hao Gao, Wu-Jun Li

Linear classification has been widely used in many high-dimensional applications like text classification.

General Classification text-classification +2

SCOPE: Scalable Composite Optimization for Learning on Spark

1 code implementation30 Jan 2016 Shen-Yi Zhao, Ru Xiang, Ying-Hao Shi, Peng Gao, Wu-Jun Li

Recently, many distributed stochastic optimization~(DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods.

Stochastic Optimization

Fast Asynchronous Parallel Stochastic Gradient Decent

no code implementations24 Aug 2015 Shen-Yi Zhao, Wu-Jun Li

Stochastic gradient descent~(SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness.

Scalable Stochastic Alternating Direction Method of Multipliers

no code implementations12 Feb 2015 Shen-Yi Zhao, Wu-Jun Li, Zhi-Hua Zhou

There exists only one stochastic method, called SA-ADMM, which can achieve convergence rate $O(1/T)$ on general convex problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.