Hate Speech Detection
85 papers with code • 13 benchmarks • 23 datasets
Hate Speech Detection is the automated task of detecting if a piece of text contains hate speech.
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers).
Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech Recognition
Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes.