Sanitizer: Sanitizing data for anonymizing sensitive information

29 Sep 2021 · Abhishek Singh, Ethan Garza, Ayush Chopra, Praneeth Vepakomma, Vivek Sharma, Ramesh Raskar ·

We propose a framework that protects against sensitive information leakage to facilitate data release with untrusted parties. Sanitization concerns with transforming a data sample to remove sensitive attribute information while retaining every other information with a goal of keeping its utility high for unknown downstream tasks. This is done in a two-step process: first, we develop a method that encodes unstructured image-like modality into a structured representation bifurcated by sensitive and non-sensitive representation. Second, we design mechanisms that transform the sensitive features such that the data obtained from projecting features back to the image protects from the leakage of sensitive information. Instead of removing sensitive information from the unstructured data, we replace the sensitive features by sampling synthetic features from the joint distribution of the sensitive features in its structured representation. Hence, using this method one can share a sanitized dataset that preserves distribution with the original dataset resulting in a good utility-privacy trade-off. We compare our technique against state-of-the-art baselines and demonstrate competitive empirical results quantitatively and qualitatively.

PDF Abstract