Adversarial Text
33 papers with code • 0 benchmarks • 2 datasets
Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.
Benchmarks
These leaderboards are used to track progress in Adversarial Text
Libraries
Use these libraries to find Adversarial Text models and implementationsMost implemented papers
RETSim: Resilient and Efficient Text Similarity
This paper introduces RETSim (Resilient and Efficient Text Similarity), a lightweight, multilingual deep learning model trained to produce robust metric embeddings for near-duplicate text retrieval, clustering, and dataset deduplication tasks.
DANCin SEQ2SEQ: Fooling Text Classifiers with Adversarial Text Example Generation
In this work, we introduce DANCin SEQ2SEQ, a GAN-inspired algorithm for adversarial text example generation targeting largely black-box text classifiers.
Adversarial Text Generation via Feature-Mover's Distance
However, the discrete nature of text hinders the application of GAN to text-generation tasks.
Discrete Adversarial Attacks and Submodular Optimization with Applications to Text Classification
In this paper we formulate the attacks with discrete input on a set function as an optimization task.
TextBugger: Generating Adversarial Text Against Real-world Applications
Deep Learning-based Text Understanding (DLTU) is the backbone technique behind various applications, including question answering, machine translation, and text classification.
Evaluating Defensive Distillation For Defending Text Processing Neural Networks Against Adversarial Examples
Adversarial examples are artificially modified input samples which lead to misclassifications, while not being detectable by humans.
Synthetic-to-Real Unsupervised Domain Adaptation for Scene Text Detection in the Wild
To address the severe domain distribution mismatch, we propose a synthetic-to-real domain adaptation method for scene text detection, which transfers knowledge from synthetic data (source domain) to real data (target domain).
Persistent Anti-Muslim Bias in Large Language Models
It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation.
SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text
In terms of misclassified texts, a classifier handles the texts with both incorrect predictions and adversarial texts, which are generated to fool the classifier, which is called a victim.