Faking Interpolation Until You Make It

29 Sep 2021 · Alasdair Paren, Rudra Poudel, M. Pawan Kumar ·

Deep over-parameterized neural networks exhibit the interpolation property on many data sets. That is, these models are able to achieve approximately zero loss on all training samples simultaneously. Recently, this property has been exploited to develop novel optimisation algorithms for this setting. These algorithms use the fact that the optimal loss value is known to employ a variation of a Polyak step-size calculated on a stochastic batch of data. We introduce a novel extension of this idea to tasks where the interpolation property does not hold. As we no longer have access to the optimal loss values a priori, we instead estimate these for each sample online. To realise this, we introduce a simple but highly effective heuristic for approximating the optimal value based on previous loss evaluations. This heuristic starts by setting the approximate optimal values to a known lower bound on the loss function, typically zero. It then updates them at fixed intervals through training in the direction of the best iterate visited so far. We provide rigorous experimentation on a wide range of problems including two natural language processing tasks, popular vision benchmarks and the challenging ImageNet classification data set. From our empirical analysis we demonstrate the effectiveness of our approach, which in the non-interpolating setting, outperforms state of the art baselines, namely adaptive gradient and line search methods.

PDF Abstract