Stochastic Optimization

164 papers with code • 13 benchmarks • 12 datasets

Stochastic Optimization is the task of optimizing certain objective functional by generating and using stochastic random variables. Usually the Stochastic Optimization is an iterative process of generating random variables that progressively finds out the minima or the maxima of the objective functional. Stochastic Optimization is usually applied in the non-convex functional spaces where the usual deterministic optimization such as linear or quadratic programming or their variants cannot be used.

Source: ASOC: An Adaptive Parameter-free Stochastic Optimization Techinique for Continuous Variables

Greatest papers with code

Revisiting Distributed Synchronous SGD

tensorflow/models 4 Apr 2016

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.

Stochastic Optimization

Lookahead Optimizer: k steps forward, 1 step back

rwightman/pytorch-image-models NeurIPS 2019

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.

Image Classification Machine Translation +1

SGDR: Stochastic Gradient Descent with Warm Restarts

rwightman/pytorch-image-models 13 Aug 2016

Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.

EEG Stochastic Optimization

On the Variance of the Adaptive Learning Rate and Beyond

lab-ml/nn ICLR 2020

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Image Classification Language Modelling +2

On the Convergence of Adam and Beyond

lab-ml/nn ICLR 2018

Several recently proposed stochastic optimization methods that have been successfully used in training deep networks such as RMSProp, Adam, Adadelta, Nadam are based on using gradient updates scaled by square roots of exponential moving averages of squared past gradients.

Stochastic Optimization

Adam: A Method for Stochastic Optimization

lab-ml/nn 22 Dec 2014

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Stochastic Optimization

Apollo: An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization

jettify/pytorch-optimizer 28 Sep 2020

In this paper, we introduce Apollo, a quasi-Newton method for nonconvex stochastic optimization, which dynamically incorporates the curvature of the loss function by approximating the Hessian via a diagonal matrix.

Stochastic Optimization

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

jettify/pytorch-optimizer 1 Jun 2020

We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN.

Stochastic Optimization

An Adaptive and Momental Bound Method for Stochastic Learning

jettify/pytorch-optimizer 27 Oct 2019

The dynamic learning rate bounds are based on the exponential moving averages of the adaptive learning rates themselves, which smooth out unexpected large learning rates and stabilize the training of deep neural networks.

Stochastic Optimization

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks

jettify/pytorch-optimizer 27 May 2019

We propose NovoGrad, an adaptive stochastic gradient descent method with layer-wise gradient normalization and decoupled weight decay.

General Classification Stochastic Optimization