Adaptive Optimizers with Sparse Group Lasso
We develop a novel framework that adds the regularizers to a family of adaptive optimizers in deep learning, such as MOMENTUM, ADAGRAD, ADAM, AMSGRAD, ADAHESSIAN, and create a new class of optimizers, which are named GROUP MOMENTUM, GROUP ADAGRAD, GROUP ADAM, GROUP AMSGRAD and GROUP ADAHESSIAN, etc., accordingly. We establish theoretically proven convergence guarantees in the stochastic convex settings, based on primal-dual methods. We evaluate the regularized effect of our new optimizers on three large scale real-world ad click datasets with state-of-the-art deep learning models. The experimental results reveal that compared with the traditional methods adding regularization terms to the loss functions, not only can the dimensions of features be effectively and efficiently reduced, the performance of the models could also get improved. Furthermore, in comparison to the cases with no regularization terms, our methods can achieve extremely sparsity with highly competitive or significantly better performance.
PDF Abstract