Paper tables with annotated results for Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Paper

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data. The backdoor can be activated when a normal input is stamped with a certain pattern called trigger, causing misclassification. Many existing trojan attacks have their triggers being input space patches/objects (e.g., a polygon with solid color) or simple input transformations such as Instagram filters. These simple triggers are susceptible to recent backdoor detection algorithms. We propose a novel deep feature space trojan attack with five characteristics: effectiveness, stealthiness, controllability, robustness and reliance on deep features. We conduct extensive experiments on 9 image classifiers on various datasets including ImageNet to demonstrate these properties and show that our attack can evade state-of-the-art defense.

PDF Paper record

Results in Papers With Code

(↓ scroll down to see all results)

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Reader Guidelines

Editor Guidelines