TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Stochastic Optimization	CIFAR-10 ResNet-18 - 200 Epochs	ADAM	Accuracy	94.84	# 4
Stochastic Optimization	CIFAR-10 ResNet-18 - 200 Epochs	Lookahead	Accuracy	95.27	# 2
Stochastic Optimization	CIFAR-10 ResNet-18 - 200 Epochs	SGD	Accuracy	95.23	# 3
Stochastic Optimization	ImageNet ResNet-50 - 50 Epochs	Lookahead	Top 1 Accuracy	75.13%	# 1
Stochastic Optimization	ImageNet ResNet-50 - 50 Epochs	SGD	Top 5 Accuracy	92.15%	# 1
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	SGD	Top 1 Accuracy	75.15%	# 2
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	SGD	Top 5 Accuracy	92.56	# 1
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	Lookahead	Top 1 Accuracy	75.49%	# 1
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	Lookahead	Top 5 Accuracy	92.53	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lookahead-optimizer-k-steps-forward-1-step/stochastic-optimization-on-imagenet-resnet-50)](https://paperswithcode.com/sota/stochastic-optimization-on-imagenet-resnet-50?p=lookahead-optimizer-k-steps-forward-1-step)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lookahead-optimizer-k-steps-forward-1-step/stochastic-optimization-on-imagenet-resnet-50-1)](https://paperswithcode.com/sota/stochastic-optimization-on-imagenet-resnet-50-1?p=lookahead-optimizer-k-steps-forward-1-step)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/lookahead-optimizer-k-steps-forward-1-step/stochastic-optimization-on-cifar-10-resnet-18)](https://paperswithcode.com/sota/stochastic-optimization-on-cifar-10-resnet-18?p=lookahead-optimizer-k-steps-forward-1-step)`

Lookahead Optimizer: k steps forward, 1 step back

NeurIPS 2019 · Michael R. Zhang, James Lucas, Geoffrey Hinton, Jimmy Ba ·

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly categorized into two approaches: (1) adaptive learning rate schemes, such as AdaGrad and Adam, and (2) accelerated schemes, such as heavy-ball and Nesterov momentum. In this paper, we propose a new optimization algorithm, Lookahead, that is orthogonal to these previous approaches and iteratively updates two sets of weights. Intuitively, the algorithm chooses a search direction by looking ahead at the sequence of fast weights generated by another optimizer. We show that Lookahead improves the learning stability and lowers the variance of its inner optimizer with negligible computation and memory cost. We empirically demonstrate Lookahead can significantly improve the performance of SGD and Adam, even with their default hyperparameter settings on ImageNet, CIFAR-10/100, neural machine translation, and Penn Treebank.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Code

Add Remove Mark official

michaelrzhang/lookahead official

233

rwightman/pytorch-image-models

29,758

alphadl/lookahead.pytorch

333

bojone/keras_lookahead

170

HamadYA/GhostFaceNets

144

See all 19 implementations

Tasks

Add Remove

Image Classification

Machine Translation

Stochastic Optimization

Translation

Datasets

CIFAR-10

ImageNet

CIFAR-100

Penn Treebank

Results from the Paper

Edit

Ranked #1 on Stochastic Optimization on ImageNet ResNet-50 - 60 Epochs

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Stochastic Optimization	CIFAR-10 ResNet-18 - 200 Epochs	ADAM	Accuracy	94.84	# 4	Compare
Stochastic Optimization	CIFAR-10 ResNet-18 - 200 Epochs	Lookahead	Accuracy	95.27	# 2	Compare
Stochastic Optimization	CIFAR-10 ResNet-18 - 200 Epochs	SGD	Accuracy	95.23	# 3	Compare
Stochastic Optimization	ImageNet ResNet-50 - 50 Epochs	Lookahead	Top 1 Accuracy	75.13%	# 1	Compare
Stochastic Optimization	ImageNet ResNet-50 - 50 Epochs	SGD	Top 5 Accuracy	92.15%	# 1	Compare
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	SGD	Top 1 Accuracy	75.15%	# 2	Compare
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	SGD	Top 5 Accuracy	92.56	# 1	Compare
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	Lookahead	Top 1 Accuracy	75.49%	# 1	Compare
Stochastic Optimization	ImageNet ResNet-50 - 60 Epochs	Lookahead	Top 5 Accuracy	92.53	# 2	Compare

Methods

Add Remove

1x1 Convolution • Adam • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Lookahead • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Lookahead Optimizer: k steps forward, 1 step back

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove