Browse > Methodology > Stochastic Optimization

Stochastic Optimization

88 papers with code · Methodology

Leaderboards

Greatest papers with code

Revisiting Distributed Synchronous SGD

4 Apr 2016tensorflow/models

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.

STOCHASTIC OPTIMIZATION

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

ICLR 2019 Luolc/AdaBound

Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates.

STOCHASTIC OPTIMIZATION

Lookahead Optimizer: k steps forward, 1 step back

NeurIPS 2019 rwightman/pytorch-image-models

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.

IMAGE CLASSIFICATION MACHINE TRANSLATION STOCHASTIC OPTIMIZATION

SGDR: Stochastic Gradient Descent with Warm Restarts

13 Aug 2016rwightman/pytorch-image-models

Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.

EEG STOCHASTIC OPTIMIZATION

On the Variance of the Adaptive Learning Rate and Beyond

8 Aug 2019LiyuanLucasLiu/RAdam

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

IMAGE CLASSIFICATION LANGUAGE MODELLING MACHINE TRANSLATION STOCHASTIC OPTIMIZATION

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

27 May 2019NVIDIA/OpenSeq2Seq

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

STOCHASTIC OPTIMIZATION

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

1 Apr 2019kaushaltrivedi/fast-bert

Our empirical results demonstrate the superior performance of LAMB across various tasks such as BERT and ResNet-50 training with very little hyperparameter tuning.

#9 best model for Question Answering on SQuAD1.1 dev (F1 metric)

QUESTION ANSWERING STOCHASTIC OPTIMIZATION

Greedy Step Averaging: A parameter-free stochastic optimization method

11 Nov 2016TalkingData/Fregata

In this paper we present the greedy step averaging(GSA) method, a parameter-free stochastic optimization algorithm for a variety of machine learning problems.

REGRESSION STOCHASTIC OPTIMIZATION

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

8 Jun 2017lufficc/SSD

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

STOCHASTIC OPTIMIZATION

Deep learning with Elastic Averaging SGD

NeurIPS 2015 JoeriHermans/dist-keras

We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance.

IMAGE CLASSIFICATION STOCHASTIC OPTIMIZATION