Quantifying and Localizing Private Information Leakage from Neural Network Gradients

Empirical attacks on collaborative learning show that the gradients of deep neural networks can not only disclose private latent attributes of the training data but also be used to reconstruct the original data. While prior works tried to quantify the privacy risk stemming from gradients, these measures do not establish a theoretically grounded understanding of gradient leakages, do not generalize across attackers, and can fail to fully explain what is observed through empirical attacks in practice... In this paper, we introduce theoretically-motivated measures to quantify information leakages in both attack-dependent and attack-independent manners. Specifically, we present an adaptation of the $\mathcal{V}$-information, which generalizes the empirical attack success rate and allows quantifying the amount of information that can leak from any chosen family of attack models. We then propose attack-independent measures, that only require the shared gradients, for quantifying both original and latent information leakages. Our empirical results, on six datasets and four popular models, reveal that gradients of the first layers contain the highest amount of original information, while the (fully-connected) classifier layers placed after the (convolutional) feature extractor layers contain the highest latent information. Further, we show how techniques such as gradient aggregation during training can mitigate information leakages. Our work paves the way for better defenses such as layer-based protection or strong aggregation. read more

PDF Abstract

No code implementations yet. Submit your code now

Results from the Paper Edit

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.