Adversarial Text

33 papers with code • 0 benchmarks • 2 datasets

Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.


Use these libraries to find Adversarial Text models and implementations
3 papers

Most implemented papers

Generative Adversarial Text to Image Synthesis

reedscot/icml2016 17 May 2016

Automatic synthesis of realistic images from text would be interesting and useful, but current AI systems are still far from this goal.

Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment

jind11/TextFooler 27 Jul 2019

Machine learning algorithms are often vulnerable to adversarial examples that have imperceptible alterations from the original counterparts but can fool the state-of-the-art models.

T3: Tree-Autoencoder Constrained Adversarial Text Generation for Targeted Attack

aisecure/AdvCodec EMNLP 2020

In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.

Generating Natural Language Attacks in a Hard Label Black Box Setting

RishabhMaheshwary/hard-label-attack 29 Dec 2020

Our proposed attack strategy leverages population-based optimization algorithm to craft plausible and semantically similar adversarial examples by observing only the top label predicted by the target model.

Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers

QData/deepWordBug 13 Jan 2018

Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios.

BAE: BERT-based Adversarial Examples for Text Classification

QData/TextAttack EMNLP 2020

Modern text classification models are susceptible to adversarial examples, perturbed versions of the original text indiscernible by humans which get misclassified by the model.

TextAttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

QData/TextAttack EMNLP 2020

TextAttack also includes data augmentation and adversarial training modules for using components of adversarial attacks to improve model accuracy and robustness.

End-to-End Adversarial Text-to-Speech

yanggeng1995/EATS ICLR 2021

Modern text-to-speech synthesis pipelines typically involve multiple processing stages, each of which is designed or learnt independently from the rest.

Searching for a Search Method: Benchmarking Search Algorithms for Generating NLP Adversarial Examples

QData/TextAttack EMNLP (BlackboxNLP) 2020

We study the behavior of several black-box search algorithms used for generating adversarial examples for natural language processing (NLP) tasks.

Semantic-Preserving Adversarial Text Attacks

advattack/bu-spo 23 Aug 2021

In this paper, we propose a Bigram and Unigram based adaptive Semantic Preservation Optimization (BU-SPO) method to examine the vulnerability of deep models.