backdoor defense
32 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in backdoor defense
Most implemented papers
Expose Backdoors on the Way: A Feature-Based Efficient Defense against Textual Backdoor Attacks
In this work, we take the first step to investigate the unconcealment of textual poisoned samples at the intermediate-feature level and propose a feature-based efficient online defense method.
FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated Learning
In this work, we theoretically analyze the connection among cross-entropy loss, attack success rate, and clean accuracy in this setting.
Backdoor Defense via Suppressing Model Shortcuts
Recent studies have demonstrated that deep neural networks (DNNs) are vulnerable to backdoor attacks during the training process.
MSDT: Masked Language Model Scoring Defense in Text Domain
Such easily-downloaded language models from various websites empowered the public users as well as some major institutions to give a momentum to their real-life application.
Backdoor Attacks for Remote Sensing Data with Wavelet Transform
Despite its simplicity, the proposed method can significantly cheat the current state-of-the-art deep learning models with a high attack success rate.
ASSET: Robust Backdoor Data Detection Across a Multiplicity of Deep Learning Paradigms
However, we lack a thorough understanding of the applicability of existing detection methods across a variety of learning settings.
Backdoor Defense via Deconfounded Representation Learning
The other clean model dedicates to capturing the desired causal effects by minimizing the mutual information with the confounding representations from the backdoored model and employing a sample-wise re-weighting scheme.
Backdoor Defense via Adaptively Splitting Poisoned Dataset
With the split clean data pool and polluted data pool, ASD successfully defends against backdoor attacks during training.
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
It detects possible triggers in the token space using image structural similarity and label consistency between the test image and MAE restorations.
Text-to-Image Diffusion Models can be Easily Backdoored through Multimodal Data Poisoning
To gain a better understanding of the training process and potential risks of text-to-image synthesis, we perform a systematic investigation of backdoor attack on text-to-image diffusion models and propose BadT2I, a general multimodal backdoor attack framework that tampers with image synthesis in diverse semantic levels.