no code implementations • 6 Feb 2024 • Yusu Hong, Junhong Lin
The Adaptive Momentum Estimation (Adam) algorithm is highly effective in training various deep learning tasks.
no code implementations • 3 Nov 2023 • Yusu Hong, Junhong Lin
To overcome these limitations, we provide a deep analysis and show that Adam could converge to the stationary point in high probability with a rate of $\mathcal{O}\left({\rm poly}(\log T)/\sqrt{T}\right)$ under coordinate-wise "affine" variance noise, not requiring any bounded gradient assumption and any problem-dependent knowledge in prior to tune hyper-parameters.