Nesterov Accelerated Gradient

Nesterov Accelerated Gradient is a momentum-based SGD optimizer that "looks ahead" to where the parameters will be to calculate the gradient ex post rather than ex ante:

$$ v_{t} = \gamma{v}_{t-1} + \eta\nabla_{\theta}J\left(\theta-\gamma{v_{t-1}}\right) $$ $$\theta_{t} = \theta_{t-1} + v_{t}$$

Like SGD with momentum $\gamma$ is usually set to $0.9$.

The intuition is that the standard momentum method first computes the gradient at the current location and then takes a big jump in the direction of the updated accumulated gradient. In contrast Nesterov momentum first makes a big jump in the direction of the previous accumulated gradient and then measures the gradient where it ends up and makes a correction. The idea being that it is better to correct a mistake after you have made it.

Image Source: Geoff Hinton lecture notes

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Image Classification	8	19.05%
General Classification	3	7.14%
Object Recognition	3	7.14%
Semantic Segmentation	2	4.76%
Open-Ended Question Answering	1	2.38%
Question Answering	1	2.38%
Denoising	1	2.38%
Image Denoising	1	2.38%
Sparse Learning	1	2.38%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Stochastic Optimization

Large Batch Optimization