Adversarial Attack
597 papers with code • 2 benchmarks • 9 datasets
An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.
Source: Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks
Libraries
Use these libraries to find Adversarial Attack models and implementationsDatasets
Subtasks
Latest papers
Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies
In light of the burgeoning success of reinforcement learning (RL) in diverse real-world applications, considerable focus has been directed towards ensuring RL policies are robust to adversarial attacks during test time.
Accuracy of TextFooler black box adversarial attacks on 01 loss sign activation neural network ensemble
We ask the following question in this study: are 01 loss sign activation neural networks hard to deceive with a popular black box text adversarial attack program called TextFooler?
HQA-Attack: Toward High Quality Black-Box Hard-Label Adversarial Attack on Text
Black-box hard-label adversarial attack on text is a practical and challenging task, as the text data space is inherently discrete and non-differentiable, and only the predicted label is accessible.
Benchmarking Transferable Adversarial Attacks
The robustness of deep learning models against adversarial attacks remains a pivotal concern.
L-AutoDA: Leveraging Large Language Models for Automated Decision-based Adversarial Attacks
In the rapidly evolving field of machine learning, adversarial attacks present a significant challenge to model robustness and security.
Fluent dreaming for language models
EPO optimizes the input prompt to simultaneously maximize the Pareto frontier between a chosen internal feature and prompt fluency, enabling fluent dreaming for language models.
Susceptibility of Adversarial Attack on Medical Image Segmentation Models
We conduct FGSM attacks on each of them and experiment with various schemes to conduct the attacks.
The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images
We address this gap in knowledge by establishing and empirically validating a generalization scaling law with respect to $d_{data}$, and propose that the substantial scaling discrepancy between the two considered domains may be at least partially attributed to the higher intrinsic ``label sharpness'' ($K_\mathcal{F}$) of medical imaging datasets, a metric which we propose.
Revealing Vulnerabilities in Stable Diffusion via Targeted Attacks
In this study, we formulate the problem of targeted adversarial attack on Stable Diffusion and propose a framework to generate adversarial prompts.
GE-AdvGAN: Improving the transferability of adversarial samples by gradient editing-based adversarial generative model
With the functional and characteristic similarity analysis, we introduce a novel gradient editing (GE) mechanism and verify its feasibility in generating transferable samples on various models.