This paper presents Out-of-Context Summarizer, a tool that takes arbitrary public news articles out of context by summarizing them to coherently fit either a liberal- or conservative-leaning agenda.
It has been observed that large-scale language models capture undesirable societal biases, e. g. relating to race and gender; yet religious bias has been relatively unexplored.
Experiments on real-world NLP datasets demonstrate that our method can generate more diverse and fluent adversarial texts, compared to many existing adversarial text generation approaches.
Specifically, we propose a novel and efficient optimization-based method that can be naturally integrated to different sequential prediction schemes, i. e., connectionist temporal classification (CTC) and attention mechanism.
To this end, we introduce a cooperative training paradigm, where a language model is cooperatively trained with the generator and we utilize the language model to efficiently shape the data distribution of the generator against mode collapse.
It can attack text classification models with a higher success rate than existing methods, and provide acceptable quality for humans in the meantime.
The objective of the adversary is to evade the learner's prediction mechanism by sending adversarial queries that result in erroneous class prediction by the learner, while the learner's objective is to reduce the incorrect prediction of these adversarial queries without degrading the prediction quality of clean queries.