Accelerating Neural Network Optimization Through an Automated Control Theory Lens

This paper studies the optimizer for accelerating the time-consuming deep network training through an automated control theory lens. We view the parameter update of a network as a feedback control process. It brings two contributions: First, we theoretically analyze the detailed intrinsic connections between deep network training and automatic control feedback system. Specifically, we demonstrate that the optimization process can be viewed as a Type I Second Order System in control field. Second, based on the math model of the equivalent system, we further design a proportional-integral-derivative algorithm type Controller with decoupled weight decay based on control theory to improve the training of deep neural networks. We conduct experiments both from a control theory lens through a phase locus verification and from a network training lens on several models, including CNNs, Transformers, MLPs, and on benchmark datasets. The results demonstrate the effectiveness of our Controller optimizer in both optimization speed and performance compared to SGD, PID Optimizer, Adam, AdamW and AdamP.

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods