Feature Grinding: Efficient Backdoor Sanitation in Deep Neural Networks

29 Sep 2021 · Nils Lukas, Charles Zhang, Florian Kerschbaum ·

Training deep neural networks (DNNs) is expensive and for this reason, third parties provide computational resources to train models. This makes DNNs vulnerable to backdoor attacks, in which the third party maliciously injects hidden functionalities in the model at training time. Removing a backdoor is challenging because although the defender has access to a clean, labeled dataset, they only have limited computational resources which are a fraction of the resources required to train a model from scratch. We propose Feature Grinding as an efficient, randomized backdoor sanitation technique against seven contemporary backdoors on CIFAR-10 and ImageNet. Feature Grinding requires at most six percent of the model's training time on CIFAR-10 and at most two percent on ImageNet for sanitizing the surveyed backdoors. We compare Feature Grinding with five other sanitation methods and find that it is often the most effective at decreasing the backdoor's success rate while preserving a high model accuracy. Our experiments include an ablation study over multiple parameters for each backdoor attack and sanitation technique to ensure a fair evaluation of all methods. Models suspected of containing a backdoor can be Feature Grinded using limited resources, which makes it a practical defense against backdoors that can be incorporated into any standard training procedure.

PDF Abstract