Adversarial Text

33 papers with code • 0 benchmarks • 2 datasets

Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.

Libraries

Use these libraries to find Adversarial Text models and implementations
3 papers
2,753

Most implemented papers

RETSim: Resilient and Efficient Text Similarity

google/unisim 28 Nov 2023

This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks.

DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation

CatherineWong/dancin_seq2seq 14 Dec 2017

In this work, we introduce DANCin SEQ2SEQ, a GAN-inspired algorithm for adversarial text example generation targeting largely black-box text classifiers.

Adversarial Text Generation via Feature-Mover's Distance

vijini/FM-GAN NeurIPS 2018

However, the discrete nature of text hinders the application of GAN to text-generation tasks.

Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification

cecilialeiqi/adversarial_text 1 Dec 2018

In this paper we formulate the attacks with discrete input on a set function as an optimization task.

TextBugger: Generating Adversarial Text Against Real-world Applications

CatherineWong/dancin_seq2seq 13 Dec 2018

Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification.

Evaluating Defensive Distillation For Defending Text Processing Neural Networks Against Adversarial Examples

Top-Ranger/text_adversarial_attack 21 Aug 2019

Adversarial examples are artificially modified input samples which lead to misclassifications, while not being detectable by humans.

Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild

weijiawu/SyntoReal_STD 3 Sep 2020

To address the severe domain distribution mismatch, we propose a synthetic-to-real domain adaptation method for scene text detection, which transfers knowledge from synthetic data (source domain) to real data (target domain).

Persistent Anti-Muslim Bias in Large Language Models

clip-italian/clip-italian 14 Jan 2021

It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.

MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation

huawei-noah/kd-nlp ACL 2021

We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.

SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text

quocnsh/sepp 12 Oct 2021

In terms of misclassified texts, a classifier handles the texts with both incorrect predictions and adversarial texts, which are generated to fool the classifier, which is called a victim.