Recent years have seen advances in generalization bounds for noisy stochastic algorithms, especially stochastic gradient Langevin dynamics (SGLD) based on stability (Mou et al., 2018; Li et al., 2020) and information theoretic approaches (Xu and Raginsky, 2017; Negrea et al., 2019; Steinke and Zakynthinou, 2020).
In this regard, the majority of DIP works for vision tasks only demonstrates the potential of the models -- reporting the peak performance against the ground truth, but provides no clue about how to operationally obtain near-peak performance without access to the groundtruth.
Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds.
(2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs?
We show that these algorithms are non-convergent whenever there is some disparity between the expected median and mean over the local gradients.