Non-monotonically Triggered ASGD

Introduced by Merity et al. in Regularizing and Optimizing LSTM Language Models

NT-ASGD, or Non-monotonically Triggered ASGD, is an averaged stochastic gradient descent technique.

In regular ASGD, we take steps identical to regular SGD but instead of returning the last iterate as the solution, we return $\frac{1}{\left(K-T+1\right)}\sum^{T}_{i=T}w_{i}$, where $K$ is the total number of iterations and $T < K$ is a user-specified averaging trigger.

NT-ASGD has a non-monotonic criterion that conservatively triggers the averaging when the validation metric fails to improve for multiple cycles. Given that the choice of triggering is irreversible, this conservatism ensures that the randomness of training does not play a major role in the decision.

Source: Regularizing and Optimizing LSTM Language Models

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	4	26.67%
Translation	3	20.00%
Machine Translation	2	13.33%
Sentence	1	6.67%
Few-Shot Image Classification	1	6.67%
General Classification	1	6.67%
Image Classification	1	6.67%
Text Classification	1	6.67%
Speech Recognition	1	6.67%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Stochastic Optimization