Stochastic Optimization

# AMSBound

Introduced by Luo et al. in Adaptive Gradient Methods with Dynamic Bound of Learning Rate

AMSBound is a variant of the AMSGrad stochastic optimizer which is designed to be more robust to extreme learning rates. Dynamic bounds are employed on learning rates, where the lower and upper bound are initialized as zero and infinity respectively, and they both smoothly converge to a constant final step size. AMSBound can be regarded as an adaptive method at the beginning of training, and it gradually and smoothly transforms to SGD (or with momentum) as time step increases.

$$g_{t} = \nabla{f}_{t}\left(x_{t}\right)$$

$$m_{t} = \beta_{1t}m_{t-1} + \left(1-\beta_{1t}\right)g_{t}$$

$$v_{t} = \beta_{2}v_{t-1} + \left(1-\beta_{2}\right)g_{t}^{2}$$

$$\hat{v}_{t} = \max\left(\hat{v}_{t-1}, v_{t}\right) \text{ and } V_{t} = \text{diag}\left(\hat{v}_{t}\right)$$

$$\eta = \text{Clip}\left(\alpha/\sqrt{V_{t}}, \eta_{l}\left(t\right), \eta_{u}\left(t\right)\right) \text{ and } \eta_{t} = \eta/\sqrt{t}$$

$$x_{t+1} = \Pi_{\mathcal{F}, \text{diag}\left(\eta_{t}^{-1}\right)}\left(x_{t} - \eta_{t} \odot m_{t} \right)$$

Where $\alpha$ is the initial step size, and $\eta_{l}$ and $\eta_{u}$ are the lower and upper bound functions respectively.

#### Papers

Paper Code Results Date Stars

#### Components

Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign