Scheduling the Learning Rate Via Hypergradients: New Insights and a New Algorithm

25 Sep 2019 · Michele Donini, Luca Franceschi, Orchid Majumder, Massimiliano Pontil, Paolo Frasconi ·

We study the problem of fitting task-specific learning rate schedules from the perspective of hyperparameter optimization. This allows us to explicitly search for schedules that achieve good generalization. We describe the structure of the gradient of a validation error w.r.t. the learning rates, the hypergradient, and based on this we introduce a novel online algorithm. Our method adaptively interpolates between two recently proposed techniques (Franceschi et al., 2017; Baydin et al.,2018), featuring increased stability and faster convergence. We show empirically that the proposed technique compares favorably with baselines and related methodsin terms of final test accuracy.

PDF Abstract