FAST DIFFERENTIALLY PRIVATE-SGD VIA JL PROJECTIONS

1 Jan 2021 · Zhiqi Bu, Sivakanth Gopi, Janardhan Kulkarni, Yin Tat Lee, Uthaipon Tantipongpipat ·

Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of per-sample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson–Lindenstrauss (JL) projections to quickly approximate the per-sample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their non-DP versions. Our algorithms achieve state-of-the-art privacy-vs-accuracy tradeoffs on MNIST and CIFAR10 datasets while being significantly faster. Unlike previous attempts to make DP-SGD faster which work only on fully-connected or convolutional layers, our algorithms work for any network in a black-box manner which is the main contribution of this paper. To illustrate this, on IMDb dataset, we train a Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff, whereas existing DP optimizers are either inefficient or inapplicable. On RNNs, our algorithms are orders of magnitude faster than DP-SGD for large batch sizes. The privacy analysis of our algorithms is more involved than DP-SGD, we use the recently proposed f-DP framework of Dong et al. (2019). In summary, we design new differentially private training algorithms which are fast, achieve state-of-the-art privacy-vs-accuracy tradeoffs and generalize to all network architectures.

PDF Abstract