no code implementations • 1 Jan 2021 • Nima Eshraghi, and Ben Liang
Prior works on gradient descent and mirror descent have shown that the dynamic regret can be upper bounded using the path length, which depend on the differences between successive minimizers, and an upper bound using the squared path length has also been shown when multiple gradient queries are allowed per round.
no code implementations • ICLR 2019 • Ali Ramezani-Kebrya, Ashish Khisti, and Ben Liang
While momentum-based methods, in conjunction with the stochastic gradient descent, are widely used when training machine learning models, there is little theoretical understanding on the generalization error of such methods.