Lookahead Optimizer: k steps forward, 1 step back

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum... (read more)

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Stochastic Optimization CIFAR-100 ResNet-18 - 200 Epochs Lookahead Accuracy 78.34% # 1
Stochastic Optimization CIFAR-100 ResNet-18 - 200 Epochs SGD Accuracy 78.24% # 2
Stochastic Optimization CIFAR-100 ResNet-18 - 200 Epochs Polyak Accuracy 77.99% # 3
Stochastic Optimization CIFAR-100 ResNet-18 - 200 Epochs ADAM Accuracy 76.88% # 4
Stochastic Optimization CIFAR-10 ResNet-18 - 200 Epochs Lookahead Accuracy 95.27% # 1
Stochastic Optimization CIFAR-10 ResNet-18 - 200 Epochs Polyak Accuracy 95.26% # 2
Stochastic Optimization CIFAR-10 ResNet-18 - 200 Epochs SGD Accuracy 95.23% # 3
Stochastic Optimization CIFAR-10 ResNet-18 - 200 Epochs ADAM Accuracy 94.84% # 4
Stochastic Optimization ImageNet ResNet-50 - 50 Epochs SGD Top 1 Accuracy 74.43% # 2
Top 5 Accuracy 92.15% # 2
Stochastic Optimization ImageNet ResNet-50 - 50 Epochs Lookahead Top 1 Accuracy 75.13% # 1
Top 5 Accuracy 92.22% # 1
Stochastic Optimization ImageNet ResNet-50 - 60 Epochs Lookahead Top 1 Accuracy 75.49% # 1
Top 5 Accuracy 92.53% # 2
Stochastic Optimization ImageNet ResNet-50 - 60 Epochs SGD Top 1 Accuracy 75.15% # 2
Top 5 Accuracy 92.56% # 1

Methods used in the Paper


METHOD TYPE
Average Pooling
Pooling Operations
Adam
Stochastic Optimization
ReLU
Activation Functions
1x1 Convolution
Convolutions
Batch Normalization
Normalization
Bottleneck Residual Block
Skip Connection Blocks
Global Average Pooling
Pooling Operations
Residual Block
Skip Connection Blocks
Kaiming Initialization
Initialization
Max Pooling
Pooling Operations
Residual Connection
Skip Connections
Convolution
Convolutions
ResNet
Convolutional Neural Networks
Lookahead
Stochastic Optimization