1 code implementation • ICML 2020 • Yu-Sheng Li, Wei-Lin Chiang, Ching-pei Lee
The expensive inter-machine communication is the bottleneck of distributed optimization.
2 code implementations • 21 Mar 2024 • Zih-Syuan Huang, Ching-pei Lee
We propose a Regularized Adaptive Momentum Dual Averaging (RAMDA) algorithm for training structured neural networks.
no code implementations • 29 Apr 2022 • Ching-pei Lee, Ling Liang, Tianyun Tang, Kim-Chuan Toh
This work proposes a rapid algorithm, BM-Global, for nuclear-norm-regularized convex and low-rank matrix optimization problems.
2 code implementations • ICLR 2022 • Zih-Syuan Huang, Ching-pei Lee
This paper proposes an algorithm (RMDA) for training neural networks (NNs) with a regularization term for promoting desired structures.
no code implementations • 29 Sep 2021 • Zih-Syuan Huang, Ching-pei Lee
Stochastic gradient descent with momentum (SGD+M) is widely used to empirically improve the convergence behavior and the generalization performance of plain stochastic gradient descent (SGD) in the training of deep learning models, but our theoretical understanding for SGD+M is still very limited.
no code implementations • 4 Dec 2020 • Ching-pei Lee
We show that for a wide class of degenerate solutions, ISQA+ possesses superlinear convergence not just only in iterations, but also in running time because the cost per iteration is bounded.
Optimization and Control
1 code implementation • 12 Dec 2019 • Ching-pei Lee, Cong Han Lim, Stephen J. Wright
When applied to the distributed dual ERM problem, unlike state of the art that takes only the block-diagonal part of the Hessian, our approach is able to utilize global curvature information and is thus magnitudes faster.
1 code implementation • 4 Mar 2018 • Ching-pei Lee, Cong Han Lim, Stephen J. Wright
Initial computational results on convex problems demonstrate that our method significantly improves on communication cost and running time over the current state-of-the-art methods.
no code implementations • 13 Jun 2015 • Ching-pei Lee
In this document, we show that the algorithm CoCoA+ (Ma et al., ICML, 2015) under the setting used in their experiments, which is also the best setting suggested by the authors that proposed this algorithm, is equivalent to the practical variant of DisDCA (Yang, NIPS, 2013).
no code implementations • 8 Jun 2015 • Ching-pei Lee, Kai-Wei Chang, Shyam Upadhyay, Dan Roth
Training structured prediction models is time-consuming.