Stochastic Optimization

301 papers with code • 12 benchmarks • 11 datasets

Stochastic Optimization is the task of optimizing certain objective functional by generating and using stochastic random variables. Usually the Stochastic Optimization is an iterative process of generating random variables that progressively finds out the minima or the maxima of the objective functional. Stochastic Optimization is usually applied in the non-convex functional spaces where the usual deterministic optimization such as linear or quadratic programming or their variants cannot be used.

Source: ASOC: An Adaptive Parameter-free Stochastic Optimization Techinique for Continuous Variables

Libraries

Use these libraries to find Stochastic Optimization models and implementations

Most implemented papers

Adam: A Method for Stochastic Optimization

labmlai/annotated_deep_learning_paper_implementations 22 Dec 2014

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Deci-AI/super-gradients 8 Jun 2017

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

tensorflow/addons ICLR 2020

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

On the Variance of the Adaptive Learning Rate and Beyond

LiyuanLucasLiu/RAdam ICLR 2020

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Lookahead Optimizer: k steps forward, 1 step back

michaelrzhang/lookahead NeurIPS 2019

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

deepmind/kfac-jax 19 Mar 2015

This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.

SGDR: Stochastic Gradient Descent with Warm Restarts

loshchil/SGDR 13 Aug 2016

Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.

Averaging Weights Leads to Wider Optima and Better Generalization

timgaripov/swa 14 Mar 2018

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.

Deep learning with Elastic Averaging SGD

sixin-zh/mpiT NeurIPS 2015

We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance.

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

kohpangwei/group_DRO 20 Nov 2019

Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.