Adversarial Attack

589 papers with code • 2 benchmarks • 9 datasets

An Adversarial Attack is a technique to find a perturbation that changes the prediction of a machine learning model. The perturbation can be very small and imperceptible to human eyes.

Source: Recurrent Attention Model with Log-Polar Mapping is Robust against Adversarial Attacks

Libraries

Use these libraries to find Adversarial Attack models and implementations

Most implemented papers

Towards Deep Learning Models Resistant to Adversarial Attacks

MadryLab/mnist_challenge ICLR 2018

Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal.

Towards Evaluating the Robustness of Neural Networks

carlini/nn_robust_attacks 16 Aug 2016

Defensive distillation is a recently proposed approach that can take an arbitrary neural network, and increase its robustness, reducing the success rate of current attacks' ability to find adversarial examples from $95\%$ to $0. 5\%$.

Technical Report on the CleverHans v2.1.0 Adversarial Examples Library

tensorflow/cleverhans 3 Oct 2016

An adversarial example library for constructing attacks, building defenses, and benchmarking both

The Limitations of Deep Learning in Adversarial Settings

openai/cleverhans 24 Nov 2015

In this work, we formalize the space of adversaries against deep neural networks (DNNs) and introduce a novel class of algorithms to craft adversarial samples based on a precise understanding of the mapping between inputs and outputs of DNNs.

Deep Variational Information Bottleneck

Linear95/CLUB 1 Dec 2016

We present a variational approximation to the information bottleneck of Tishby et al. (1999).

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

haofanwang/Score-CAM 3 Oct 2019

Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions.

Universal and Transferable Adversarial Attacks on Aligned Language Models

llm-attacks/llm-attacks 27 Jul 2023

Specifically, our approach finds a suffix that, when attached to a wide range of queries for an LLM to produce objectionable content, aims to maximize the probability that the model produces an affirmative response (rather than refusing to answer).

Provable defenses against adversarial examples via the convex outer adversarial polytope

locuslab/convex_adversarial ICML 2018

We propose a method to learn deep ReLU-based classifiers that are provably robust against norm-bounded adversarial perturbations on the training data.

Theoretically Principled Trade-off between Robustness and Accuracy

yaodongyu/TRADES 24 Jan 2019

We identify a trade-off between robustness and accuracy that serves as a guiding principle in the design of defenses against adversarial examples.

Boosting Adversarial Attacks with Momentum

dongyp13/Non-Targeted-Adversarial-Attacks CVPR 2018

To further improve the success rates for black-box attacks, we apply momentum iterative algorithms to an ensemble of models, and show that the adversarially trained models with a strong defense ability are also vulnerable to our black-box attacks.