In this paper, we propose a novel and practical mechanism which enables the service provider to verify whether a suspect model is stolen from the victim model via model extraction attacks.
We find that (i) commercial antivirus engines are vulnerable to AMM-guided test cases; (ii) the ability of a manipulated malware generated using one detector to evade detection by another detector (i. e., transferability) depends on the overlap of features with large AMM values between the different detectors; and (iii) AMM values effectively measure the fragility of features (i. e., capability of feature-space manipulation to flip the prediction results) and explain the robustness of malware detectors facing evasion attacks.
We are able to demonstrate the adversary's high success rate of attacks, while maintaining functionality for regular users, with triggers inconspicuous by the human administrators.
We show that the proposed invisible backdoors can be fairly effective across various DNN models as well as four datasets MNIST, CIFAR-10, CIFAR-100, and GTSRB, by measuring their attack success rates for the adversary, functionality for the normal users, and invisibility scores for the administrators.
We conjecture that the key to defend against the model inversion and GAN-based attacks is not due to differential privacy but the perturbation of training data.