Adaptive Optimizers with Sparse Group Lasso

1 Jan 2021  ·  Yun Yue, Suo Tong, Zhen Zhang, Yongchao Liu, Chunyang Wen, Huanjun Bao, Jinjie Gu, Yixiang Mu ·

We develop a novel framework that adds the regularizers to a family of adaptive optimizers in deep learning, such as MOMENTUM, ADAGRAD, ADAM, AMSGRAD, ADAHESSIAN, and create a new class of optimizers, which are named GROUP MOMENTUM, GROUP ADAGRAD, GROUP ADAM, GROUP AMSGRAD and GROUP ADAHESSIAN, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the traditional methods adding regularization terms to the loss functions, not only can the dimensions of features be effectively and efficiently reduced, the performance of the models could also get improved. Furthermore, in comparison to the cases with no regularization terms, our methods can achieve extremely sparsity with highly competitive or significantly better performance.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods