On the Iteration Complexity of Hypergradient Computations

We study a general class of bilevel optimization problems, in which the upper-level objective is defined via the solution of a fixed point equation. Important instances arising in machine learning include hyper-parameter optimization, meta-learning, graph and recurrent neural networks. Typically the gradient of the upper-level objective is not known explicitly or is hard to compute exactly, which has raised the interest in approximation methods. We investigate two popular approaches to compute the hypergradient, based on reverse mode iterative differentiation and approximate implicit differentiation. We present a unified analysis which allows for the first time to quantitatively compare these methods, providing explicit bounds for their iteration complexity. This analysis suggests a hierarchy in terms of computational efficiency among the above methods, with approximate implicit differentiation based on conjugate gradient performing best. We present an extensive experimental comparison among the methods which confirm the theoretical findings.

PDF ICML 2020 PDF

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here