We study generalization bounds for noisy stochastic mini-batch iterative algorithms based on the notion of stability.
Our experiments on a variety of benchmark datasets (MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100) with various networks (VGG and ResNet) validate the theoretical properties of NT-SGD, i. e., NT-SGD matches the speed and accuracy of vanilla SGD while effectively working with sparse gradients, and can successfully escape poor local minima.
Adaptive gradient methods such as AdaGrad, RMSprop and Adam have been optimizers of choice for deep learning due to their fast training speed.
We obtain this rate by providing the first analyses on a collection of private gradient-based methods, including adaptive algorithms DP RMSProp and DP Adam.
Existing approaches for deterministic non-smooth deep nets typically need to bound the Lipschitz constant of such deep nets but such bounds are quite large, may even increase with the training set size yielding vacuous generalization bounds.
(2) how can we characterize the stochastic optimization dynamics of SGD with fixed and adaptive step sizes and diagonal pre-conditioning based on the first and second moments of SGs?
An increasing number of people participate in social networks and massive online social data are obtained.
In addition, none of them has considered both the privacy of users' contexts (e, g., social status, ages and hobbies) and video service vendors' repositories, which are extremely sensitive and of significant commercial value.