Adversarial Text
33 papers with code • 0 benchmarks • 2 datasets
Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.
Benchmarks
These leaderboards are used to track progress in Adversarial Text
Libraries
Use these libraries to find Adversarial Text models and implementationsLatest papers with no code
Improved Training of Mixture-of-Experts Language GANs
In this work, we (1) first empirically show that the mixture-of-experts approach is able to enhance the representation capacity of the generator for language GANs and (2) harness the Feature Statistics Alignment (FSA) paradigm to render fine-grained learning signals to advance the generator training.
TextDefense: Adversarial Text Detection based on Word Importance Entropy
TextDefense differs from previous approaches, where it utilizes the target model for detection and thus is attack type agnostic.
A survey on text generation using generative adversarial networks
This work presents a thorough review concerning recent studies and text generation advancements using Generative Adversarial Networks.
Adversarial Text Normalization
Additionally, the process to retrain a model is time and resource intensive, creating a need for a lightweight, reusable defense.
Data-Driven Mitigation of Adversarial Text Perturbation
We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words.
Identifying Adversarial Attacks on Text Classifiers
The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack.
SemAttack: Natural Textual Attacks via Different Semantic Spaces
In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e. g., WordNet), contextualized semantic space (e. g., the embedding space of BERT clusterings), or the combination of these spaces.
Repairing Adversarial Texts through Perturbation
Furthermore, such attacks are impossible to eliminate, i. e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training.
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
Adversarial attacks are a major challenge faced by current machine learning research.
Generating Watermarked Adversarial Texts
Adversarial example generation has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models.