Gradient Broadcast Adaptation: Defending against the backdoor attack in pre-trained models

29 Sep 2021  ·  Tianyu Chen, Haoyi Zhou, He Mingrui, JianXin Li ·

Pre-trained language models (e.g, BERT, GPT-3) have revolutionized the NLP research and fine-tuning becomes the indispensable step of downstream adaptation. However, the covert attack is the emerging threat to the pre-train-then-fine tuning learning paradigm. The backdoor attack is a typical challenge, which the victim model fails on the trigger-activated samples while behaves normally on others. These backdoors could survive the cascading fine-tuning stage, which continually posing the application of pre-trained models. In this paper, we proposed a Gradient Broadcast Adaptation (GBA) method, prevent the model from controlled producing outputs in a trigger-anchor-free manner. We design the prompt-based tuning, flexibly accessing the rare tokens while providing a fair measure of distance in word embedding space. The gradient broadcast alleviates lazy updating of potential triggers and purges the underlying abnormal weights. The GBA defense method is evaluated over five text-classification tasks against three state-of-the-art backdoor attacks. We find our method can cover nearly 100% embedded backdoor with negligible performance loss on clean data.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods