Adversarial Attack on Sentiment Classification

In this paper, we propose a white-box attack algorithm called {``}Global Search{''} method and compare it with a simple misspelling noise and a more sophisticated and common white-box attack approach called {``}Greedy Search{''}. The attack methods are evaluated on the Convolutional Neural Network (CNN) sentiment classifier trained on the IMDB movie review dataset. The attack success rate is used to evaluate the effectiveness of the attack methods and the perplexity of the sentences is used to measure the degree of distortion of the generated adversarial examples. The experiment results show that the proposed {``}Global Search{''} method generates more powerful adversarial examples with less distortion or less modification to the source text.

PDF Abstract WS 2019 PDF WS 2019 Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here