Stochastic Optimization

Stochastic Gradient Descent

Stochastic Gradient Descent is an iterative optimization technique that uses minibatches of data to form an expectation of the gradient, rather than the full gradient using all available data. That is for weights $w$ and a loss function $L$ we have:

$$ w_{t+1} = w_{t} - \eta\hat{\nabla}_{w}{L(w_{t})} $$

Where $\eta$ is a learning rate. SGD reduces redundancy compared to batch gradient descent - which recomputes gradients for similar examples before each parameter update - so it is usually much faster.

(Image Source: here)

Papers


Paper Code Results Date Stars

Tasks


Task Papers Share
Federated Learning 33 11.15%
Language Modelling 23 7.77%
Image Classification 20 6.76%
Language Modeling 17 5.74%
Quantization 13 4.39%
Computational Efficiency 10 3.38%
Deep Learning 9 3.04%
Learning Theory 9 3.04%
Continual Learning 7 2.36%

Components


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories