Adversarial Text

33 papers with code • 0 benchmarks • 2 datasets

Adversarial Text refers to a specialised text sequence that is designed specifically to influence the prediction of a language model. Generally, Adversarial Text attack are carried out on Large Language Models (LLMs). Research on understanding different adversarial approaches can help us build effective defense mechanisms to detect malicious text input and build robust language models.

Libraries

Use these libraries to find Adversarial Text models and implementations
3 papers
2,753

Latest papers with no code

Improved Training of Mixture-of-Experts Language GANs

no code yet • 23 Feb 2023

In this work, we (1) first empirically show that the mixture-of-experts approach is able to enhance the representation capacity of the generator for language GANs and (2) harness the Feature Statistics Alignment (FSA) paradigm to render fine-grained learning signals to advance the generator training.

TextDefense: Adversarial Text Detection based on Word Importance Entropy

no code yet • 12 Feb 2023

TextDefense differs from previous approaches, where it utilizes the target model for detection and thus is attack type agnostic.

A survey on text generation using generative adversarial networks

no code yet • 20 Dec 2022

This work presents a thorough review concerning recent studies and text generation advancements using Generative Adversarial Networks.

Adversarial Text Normalization

no code yet • NAACL (ACL) 2022

Additionally, the process to retrain a model is time and resource intensive, creating a need for a lightweight, reusable defense.

Data-Driven Mitigation of Adversarial Text Perturbation

no code yet • 19 Feb 2022

We propose Continuous Word2Vec (CW2V), our data-driven method to learn word embeddings that ensures that perturbations of words have embeddings similar to those of the original words.

Identifying Adversarial Attacks on Text Classifiers

no code yet • 21 Jan 2022

The landscape of adversarial attacks against text classifiers continues to grow, with new attacks developed every year and many of them available in standard toolkits, such as TextAttack and OpenAttack.

SemAttack: Natural Textual Attacks via Different Semantic Spaces

no code yet • ACL ARR January 2022

In particular, SemAttack optimizes the generated perturbations constrained on generic semantic spaces, including typo space, knowledge space (e. g., WordNet), contextualized semantic space (e. g., the embedding space of BERT clusterings), or the combination of these spaces.

Repairing Adversarial Texts through Perturbation

no code yet • 29 Dec 2021

Furthermore, such attacks are impossible to eliminate, i. e., the adversarial perturbation is still possible after applying mitigation methods such as adversarial training.

"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks

no code yet • ACL ARR November 2021

Adversarial attacks are a major challenge faced by current machine learning research.

Generating Watermarked Adversarial Texts

no code yet • 25 Oct 2021

Adversarial example generation has been a hot spot in recent years because it can cause deep neural networks (DNNs) to misclassify the generated adversarial examples, which reveals the vulnerability of DNNs, motivating us to find good solutions to improve the robustness of DNN models.