Adaptive Optimizers with Sparse Group Lasso

1 Jan 2021 · Yun Yue, Suo Tong, Zhen Zhang, Yongchao Liu, Chunyang Wen, Huanjun Bao, Jinjie Gu, Yixiang Mu ·

We develop a novel framework that adds the regularizers to a family of adaptive optimizers in deep learning, such as MOMENTUM, ADAGRAD, ADAM, AMSGRAD, ADAHESSIAN, and create a new class of optimizers, which are named GROUP MOMENTUM, GROUP ADAGRAD, GROUP ADAM, GROUP AMSGRAD and GROUP ADAHESSIAN, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the traditional methods adding regularization terms to the loss functions, not only can the dimensions of features be effectively and efficiently reduced, the performance of the models could also get improved. Furthermore, in comparison to the cases with no regularization terms, our methods can achieve extremely sparsity with highly competitive or significantly better performance.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

AMSGrad

Edit Social Preview

Adaptive Optimizers with Sparse Group Lasso

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove