Use of small auxiliary networks and scarce data to improve the adversarial robustness of deep learning models

29 Sep 2021 · Davide Coppola, Hwee Kuan Lee, Cuntai Guan ·

Deep Learning models for image classification are known to be vulnerable to adversarial examples. Adversarial training is one of the most effective ways to provide defense against such threats, however it is a cumbersome process which requires many data points and long computation times. In a setting where only small amounts of data are available for this process, adversarial training may negatively impact the classification performance on clean images by overfitting on the small amount of data. This would be undesirable, especially when a large pre-trained model with satisfactory performance on clean data is already available. We propose a new strategy to make a previously-trained model more robust against adversarial attacks, using scarce data and without degrading its performance on clean samples. The proposed strategy consists in freezing the parameters of the originally trained base model and adding small auxiliary networks along the architecture, which process the features to reduce the effect of any adversarial perturbation. This method can be used to defend a model against any arbitrary attack. A practical advantage of using auxiliary networks is that no modifications on the originally trained base model is required. Therefore, it can serve as a patch or add on to fix large and expensive existing deep learning models with little additional resources. Experiments on the CIFAR10 dataset showed that using only $10\%$ of the full training set, the proposed method was able to adequately defend the model against the AutoPGD attack while maintaining a classification accuracy on clean images outperforming the model with adversarial training by $7\%$. Indeed, the proposed method still performs reasonably well compared to adversarial training using $1\%$ of the full training set.

PDF Abstract