Stochastic Optimization

# SGD with Momentum

SGD with Momentum is a stochastic optimization method that adds a momentum term to regular stochastic gradient descent:

$$v_{t} = \gamma{v}_{t-1} + \eta\nabla_{\theta}J\left(\theta\right)$$ $$\theta_{t} = \theta_{t-1} - v_{t}$$

A typical value for $\gamma$ is $0.9$. The momentum name comes from an analogy to physics, such as ball accelerating down a slope. In the case of weight updates, we can think of the weights as a particle traveling through parameter space which incurs acceleration from the gradient of the loss.

Image Source: Juan Du

#### Papers

Paper Code Results Date Stars

#### Components

Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign