no code implementations • ICLR 2019 • Lu Hou, Ruiliang Zhang, James T. Kwok
We show that (i) weight-quantized networks converge to an error related to the weight quantization resolution and weight dimension; (ii) quantizing gradients slows convergence by a factor related to the gradient quantization resolution and dimension; and (iii) clipping the gradient before quantization renders this factor dimension-free, thus allowing the use of fewer bits for gradient quantization.
no code implementations • 25 Feb 2016 • Shuai Zheng, Ruiliang Zhang, James T. Kwok
In regularized risk minimization, the associated optimization problem becomes particularly difficult when both the loss and regularizer are nonsmooth.
no code implementations • 7 Aug 2015 • Ruiliang Zhang, Shuai Zheng, James T. Kwok
With the recent proliferation of large-scale learning problems, there have been a lot of interest on distributed machine learning algorithms, particularly those that are based on stochastic gradient descent (SGD) and its variants.