Stochastic Gradient Descent

Stochastic Gradient Descent is an iterative optimization technique that uses minibatches of data to form an expectation of the gradient, rather than the full gradient using all available data. That is for weights $w$ and a loss function $L$ we have:

$$ w_{t+1} = w_{t} - \eta\hat{\nabla}_{w}{L(w_{t})} $$

Where $\eta$ is a learning rate. SGD reduces redundancy compared to batch gradient descent - which recomputes gradients for similar examples before each parameter update - so it is usually much faster.

(Image Source: here)

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Federated Learning	38	14.13%
Image Classification	20	7.43%
Language Modelling	17	6.32%
Computational Efficiency	10	3.72%
Quantization	10	3.72%
Generalization Bounds	10	3.72%
Learning Theory	9	3.35%
Continual Learning	8	2.97%
Response Generation	6	2.23%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Stochastic Optimization