Texts

ToxiGen

Introduced by Hartvigsen et al. in ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection

A large-scale and machine-generated dataset of 274,186 toxic and benign statements about 13 minority groups.

This dataset uses a demonstration-based prompting framework and an adversarial classifier-in-the-loop decoding method to generate subtly toxic and benign text with a massive pre-trained language model (GPT-3). Controlling machine generation in this way allows TOXIGEN to cover implicitly toxic text at a larger scale, and about more demographic groups, than previous resources of human-written text. TOXIGEN can be used to fight human-written and machine-generated toxicity.

Homepage