Stochastic Optimization

278 papers with code • 12 benchmarks • 11 datasets

Stochastic Optimization is the task of optimizing certain objective functional by generating and using stochastic random variables. Usually the Stochastic Optimization is an iterative process of generating random variables that progressively finds out the minima or the maxima of the objective functional. Stochastic Optimization is usually applied in the non-convex functional spaces where the usual deterministic optimization such as linear or quadratic programming or their variants cannot be used.

Source: ASOC: An Adaptive Parameter-free Stochastic Optimization Techinique for Continuous Variables

Benchmarks

Add a Result

These leaderboards are used to track progress in Stochastic Optimization

Dataset	Best Model	Compare
CIFAR-10 WRN-28-10 - 200 Epochs	Adam (eps-adjusted)	See all
CIFAR-100 WRN-28-10 - 200 Epochs	AvaGrad	See all
CIFAR-10 ResNet-18 - 200 Epochs	SGD - cosine LR schedule	See all
ImageNet ResNet-50 - 90 Epochs	AvaGrad	See all
Penn Treebank (Character Level) 3x1000 LSTM - 500 Epochs	AvaGrad	See all
ImageNet ResNet-50 - 50 Epochs	Lookahead	See all
ImageNet ResNet-50 - 60 Epochs	Lookahead	See all
CIFAR-10	Resnet18	See all
CIFAR-100	Resnet18	See all
MNIST	MLP	See all
CoLA	Bert	See all
AG News	Bert	See all

Show all 12 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Stochastic Optimization models and implementations

rwightman/pytorch-image-models

2 papers

29,671

frgfm/Holocron

2 papers

301

icaros-usc/pyribs

2 papers

196

kazukiosawa/asdfghjkl

2 papers

172

See all 7 libraries.

Datasets

Subtasks

Most implemented papers

Most implemented Social Latest No code

Variational Inference: A Review for Statisticians

magister-informatica-uach/INFO3XX • • 4 Jan 2016

One of the core problems of modern statistics is to approximate difficult-to-compute probability densities.

Paper
Code

Training Deep Networks without Learning Rates Through Coin Betting

bremen79/cocob • • NeurIPS 2017

Instead, we reduce the optimization process to a game of betting on a coin and propose a learning-rate-free optimal algorithm for this scenario.

Paper
Code

Agnostic Federated Learning

litian96/fair_flearn • 1 Feb 2019

A key learning scenario in large-scale applications is that of federated learning, where a centralized model is trained based on data originating from a large number of clients.

Paper
Code

Eve: A Gradient Based Optimization Method with Locally and Globally Adaptive Learning Rates

rooa/eve • 4 Nov 2016

Adaptive gradient methods for stochastic optimization adjust the learning rate for each parameter locally.

Paper
Code

Learning concise representations for regression by evolving networks of trees

lacava/feat • ICLR 2019

We propose and study a method for learning interpretable representations for the task of regression.

Paper
Code

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

facebookresearch/madgrad • • 26 Jan 2021

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods.

Paper
Code

Second-Order Stochastic Optimization for Machine Learning in Linear Time

brianbullins/lissa_code • 12 Feb 2016

First-order stochastic methods are the state-of-the-art in large-scale machine learning optimization owing to efficient per-iteration complexity.

Paper
Code

Revisiting Distributed Synchronous SGD

tensorflow/models • • 4 Apr 2016

Distributed training of deep learning models on large-scale training data is typically conducted with asynchronous stochastic optimization to maximize the rate of updates, at the cost of additional noise introduced from asynchrony.

Paper
Code

Kronecker Determinantal Point Processes

alshedivat/DeterminantalPointProcesses.jl • NeurIPS 2016

Determinantal Point Processes (DPPs) are probabilistic models over all subsets a ground set of $N$ items.

Paper
Code

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost

booydar/t5-experiments • • ICML 2018

In several recently proposed stochastic optimization methods (e. g. RMSProp, Adam, Adadelta), parameter updates are scaled by the inverse square roots of exponential moving averages of squared past gradients.

Paper
Code

Stochastic Optimization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result