Backdoor Attack

146 papers with code • 0 benchmarks • 0 datasets

Backdoor attacks inject maliciously constructed data into a training set so that, at test time, the trained model misclassifies inputs patched with a backdoor trigger as an adversarially-desired target class.

Benchmarks

Add a Result

These leaderboards are used to track progress in Backdoor Attack

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Backdoor Attack models and implementations

ventr1c/ugba

3 papers

bboylyg/abl

2 papers

Most implemented papers

Most implemented Social Latest No code

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification

Megum1/DFST • • 21 Dec 2020

Trojan (backdoor) attack is a form of adversarial attack on deep neural networks where the attacker provides victims with a model trained/retrained on malicious data.

Paper
Code

LIRA: Learnable, Imperceptible and Robust Backdoor Attacks

pibo16/backdoor_attacks • • ICCV 2021

Under this optimization framework, the trigger generator function will learn to manipulate the input with imperceptible noise to preserve the model performance on the clean data and maximize the attack success rate on the poisoned data.

Paper
Code

Targeted Attack against Deep Neural Networks via Flipping Limited Weight Bits

jiawangbai/TA-LBF • • ICLR 2021

By utilizing the latest technique in integer programming, we equivalently reformulate this BIP problem as a continuous optimization problem, which can be effectively and efficiently solved using the alternating direction method of multipliers (ADMM) method.

Paper
Code

Hidden Killer: Invisible Textual Backdoor Attacks with Syntactic Trigger

thunlp/HiddenKiller • • ACL 2021

As far as we know, almost all existing textual backdoor attack methods insert additional contents into normal samples as triggers, which causes the trigger-embedded samples to be detected and the backdoor attacks to be blocked without much effort.

Paper
Code

Triggerless Backdoor Attack for NLP Tasks with Clean Labels

leileigan/clean_label_textual_backdoor_attack • • NAACL 2022

To deal with this issue, in this paper, we propose a new strategy to perform textual backdoor attacks which do not require an external trigger, and the poisoned samples are correctly labeled.

Paper
Code

Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information

ruoxi-jia-group/narcissus-backdoor-attack • • 11 Apr 2022

With poisoning equal to or less than 0. 5% of the target-class data and 0. 05% of the training set, we can train a model to classify test examples from arbitrary classes into the target class when the examples are patched with a backdoor trigger.

Paper
Code

Neurotoxin: Durable Backdoors in Federated Learning

jhcknzzm/federated-learning-backdoor • • 12 Jun 2022

In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs.

Paper
Code

Backdoor Attacks Against Dataset Distillation

liuyugeng/baadd • • 3 Jan 2023

A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset.

Paper
Code

Adversarial Feature Map Pruning for Backdoor

retsuh-bqw/fmp • • 21 Jul 2023

Unlike existing defense strategies, which focus on reproducing backdoor triggers, FMP attempts to prune backdoor feature maps, which are trained to extract backdoor information from inputs.

Paper
Code

Universal Jailbreak Backdoors from Poisoned Human Feedback

ethz-spylab/rlhf-poisoning • • 24 Nov 2023

Reinforcement Learning from Human Feedback (RLHF) is used to align large language models to produce helpful and harmless responses.

Paper
Code

Backdoor Attack

Benchmarks Add a Result

Libraries

Most implemented papers

Content

Benchmarks

Add a Result