no code implementations • 27 Aug 2021 • Yitao Shen, Yue Wang, Xingyu Lu, Feng Qi, Jia Yan, Yixiang Mu, Yao Yang, Yifan Peng, Jinjie Gu
In order to do effective optimization in the second stage, counterfactual prediction and noise-reduction are essential for the first stage.
1 code implementation • 30 Jul 2021 • Yun Yue, Yongchao Liu, Suo Tong, Minghao Li, Zhen Zhang, Chunyang Wen, Huanjun Bao, Lihong Gu, Jinjie Gu, Yixiang Mu
We develop a novel framework that adds the regularizers of the sparse group lasso to a family of adaptive optimizers in deep learning, such as Momentum, Adagrad, Adam, AMSGrad, AdaHessian, and create a new class of optimizers, which are named Group Momentum, Group Adagrad, Group Adam, Group AMSGrad and Group AdaHessian, etc., accordingly.
no code implementations • 1 Jan 2021 • Yun Yue, Suo Tong, Zhen Zhang, Yongchao Liu, Chunyang Wen, Huanjun Bao, Jinjie Gu, Yixiang Mu
We develop a novel framework that adds the regularizers to a family of adaptive optimizers in deep learning, such as MOMENTUM, ADAGRAD, ADAM, AMSGRAD, ADAHESSIAN, and create a new class of optimizers, which are named GROUP MOMENTUM, GROUP ADAGRAD, GROUP ADAM, GROUP AMSGRAD and GROUP ADAHESSIAN, etc., accordingly.