We study the behavior of stochastic gradient descent applied to $\Ax b \_2^2 \rightarrow \min$ for invertible $A \in \mathbb{R}^{n \times n}$. We show that there is an explicit constant $c_{A}$ depending (mildly) on $A$ such that $$ \mathbb{E} ~\left\ Ax_{k+1}b\right\^2_{2} \leq \left(1 + \frac{c_{A}}{\A\_F^2}\right) \left\A x_k b \right\^2_{2}  \frac{2}{\A\_F^2} \left\A^T A (x_k  x)\right\^2_{2}.$$ This is a curious inequality: the last term has one more matrix applied to the residual $u_k  u$ than the remaining terms: if $x_k  x$ is mainly comprised of large singular vectors, stochastic gradient descent leads to a quick regularization... (read more)
PDFMETHOD  TYPE  

SGD

Stochastic Optimization 