Adversarial Attack

597 papers with code • 2 benchmarks • 9 datasets

An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.

Source: Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks

Libraries

Use these libraries to find Adversarial Attack models and implementations

Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies

umd-huang-lab/protected 20 Feb 2024

In light of the burgeoning success of reinforcement learning (RL) in diverse real-world applications, considerable focus has been directed towards ensuring RL policies are robust to adversarial attacks during test time.

0
20 Feb 2024

Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble

zero-one-loss/wordcnn01 12 Feb 2024

We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler?

0
12 Feb 2024

HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text

hqa-attack/hqaattack-demo NeurIPS 2023

Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible.

0
02 Feb 2024

Benchmarking Transferable Adversarial Attacks

kxplaug/taa-bench 1 Feb 2024

The robustness of deep learning models against adversarial attacks remains a pivotal concern.

4
01 Feb 2024

L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks

FeiLiu36/LLM4MOEA 27 Jan 2024

In the rapidly evolving field of machine learning, adversarial attacks present a significant challenge to model robustness and security.

19
27 Jan 2024

Fluent dreaming for language models

confirm-solutions/dreamy 24 Jan 2024

EPO optimizes the input prompt to simultaneously maximize the Pareto frontier between a chosen internal feature and prompt fluency, enabling fluent dreaming for language models.

4
24 Jan 2024

Susceptibility of Adversarial Attack on Medical Image Segmentation Models

zhongxuanwang/adv_attk 20 Jan 2024

We conduct FGSM attacks on each of them and experiment with various schemes to conduct the attacks.

1
20 Jan 2024

The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images

mazurowski-lab/intrinsic-properties 16 Jan 2024

We address this gap in knowledge by establishing and empirically validating a generalization scaling law with respect to $d_{data}$, and propose that the substantial scaling discrepancy between the two considered domains may be at least partially attributed to the higher intrinsic ``label sharpness'' ($K_\mathcal{F}$) of medical imaging datasets, a metric which we propose.

10
16 Jan 2024

Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks

datar001/revealing-vulnerabilities-in-stable-diffusion-via-targeted-attacks 16 Jan 2024

In this study, we formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts.

6
16 Jan 2024

GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model

lmbtough/ge-advgan 11 Jan 2024

With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models.

9
11 Jan 2024