Stochastic Optimization

282 papers with code • 12 benchmarks • 11 datasets

Stochastic Optimization is the task of optimizing certain objective functional by generating and using stochastic random variables. Usually the Stochastic Optimization is an iterative process of generating random variables that progressively finds out the minima or the maxima of the objective functional. Stochastic Optimization is usually applied in the non-convex functional spaces where the usual deterministic optimization such as linear or quadratic programming or their variants cannot be used.

Source: ASOC: An Adaptive Parameter-free Stochastic Optimization Techinique for Continuous Variables

Benchmarks

Add a Result

These leaderboards are used to track progress in Stochastic Optimization

Dataset	Best Model	Compare
CIFAR-10 WRN-28-10 - 200 Epochs	Adam (eps-adjusted)	See all
CIFAR-100 WRN-28-10 - 200 Epochs	AvaGrad	See all
CIFAR-10 ResNet-18 - 200 Epochs	SGD - cosine LR schedule	See all
ImageNet ResNet-50 - 90 Epochs	AvaGrad	See all
Penn Treebank (Character Level) 3x1000 LSTM - 500 Epochs	AvaGrad	See all
ImageNet ResNet-50 - 50 Epochs	Lookahead	See all
ImageNet ResNet-50 - 60 Epochs	Lookahead	See all
CIFAR-10	Resnet18	See all
CIFAR-100	Resnet18	See all
MNIST	MLP	See all
CoLA	Bert	See all
AG News	Bert	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Stochastic Optimization models and implementations

rwightman/pytorch-image-models

2 papers

29,974

frgfm/Holocron

2 papers

303

icaros-usc/pyribs

2 papers

198

kazukiosawa/asdfghjkl

2 papers

173

See all 8 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Adam: A Method for Stochastic Optimization

labmlai/annotated_deep_learning_paper_implementations • • 22 Dec 2014

We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments.

Paper
Code

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour

Deci-AI/super-gradients • • 8 Jun 2017

To achieve this result, we adopt a hyper-parameter-free linear scaling rule for adjusting learning rates as a function of minibatch size and develop a new warmup scheme that overcomes optimization challenges early in training.

Paper
Code

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

tensorflow/addons • • ICLR 2020

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

Paper
Code

On the Variance of the Adaptive Learning Rate and Beyond

LiyuanLucasLiu/RAdam • • ICLR 2020

The learning rate warmup heuristic achieves remarkable success in stabilizing training, accelerating convergence and improving generalization for adaptive stochastic optimization algorithms like RMSprop and Adam.

Paper
Code

Lookahead Optimizer: k steps forward, 1 step back

michaelrzhang/lookahead • • NeurIPS 2019

The vast majority of successful deep neural networks are trained using variants of stochastic gradient descent (SGD) algorithms.

Paper
Code

Optimizing Neural Networks with Kronecker-factored Approximate Curvature

deepmind/kfac-jax • • 19 Mar 2015

This is because the cost of storing and inverting K-FAC's approximation to the curvature matrix does not depend on the amount of data used to estimate it, which is a feature typically associated only with diagonal or low-rank approximations to the curvature matrix.

Paper
Code

SGDR: Stochastic Gradient Descent with Warm Restarts

loshchil/SGDR • 13 Aug 2016

Partial warm restarts are also gaining popularity in gradient-based optimization to improve the rate of convergence in accelerated gradient schemes to deal with ill-conditioned functions.

Paper
Code

Averaging Weights Leads to Wider Optima and Better Generalization

timgaripov/swa • • 14 Mar 2018

Deep neural networks are typically trained by optimizing a loss function with an SGD variant, in conjunction with a decaying learning rate, until convergence.

Paper
Code

Deep learning with Elastic Averaging SGD

sixin-zh/mpiT • • NeurIPS 2015

We empirically demonstrate that in the deep learning setting, due to the existence of many local optima, allowing more exploration can lead to the improved performance.

Paper
Code

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

kohpangwei/group_DRO • • 20 Nov 2019

Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups.

Paper
Code

Stochastic Optimization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result