We study generalization bounds for noisy stochastic mini-batch iterative algorithms based on the notion of stability.
In this regard, the majority of DIP works for vision tasks only demonstrates the potential of the models -- reporting the peak performance against the ground truth, but provides no clue about how to operationally obtain near-peak performance without access to the groundtruth.
Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds.
(2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs?
We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients.