23 papers with code • 2 benchmarks • 3 datasets
Although various techniques have been proposed to generate adversarial samples for white-box attacks on text, little attention has been paid to black-box attacks, which are more realistic scenarios.
We introduce a graphical framework that (1) generalizes existing attacks in discrete domains, (2) can accommodate complex cost functions beyond $p$-norms, including financial cost incurred when attacking a classifier, and (3) efficiently produces valid adversarial examples with guarantees of minimal adversarial cost.
In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition.
The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution.
Natural language processing (NLP) has recently gained much attention for representing and analysing human language computationally.