Hate Speech Detection
124 papers with code • 13 benchmarks • 34 datasets
Hate speech detection is the task of detecting if communication such as text, audio, and so on contains hatred and or encourages violence towards a person or a group of people. This is usually based on prejudice against 'protected characteristics' such as their ethnicity, gender, sexual orientation, religion, age et al. Some example benchmarks are ETHOS and HateXplain. Models can be evaluated with metrics like the F-score or F-measure.
Libraries
Use these libraries to find Hate Speech Detection models and implementationsDatasets
Most implemented papers
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.
Automated Hate Speech Detection and the Problem of Offensive Language
We train a multi-class classifier to distinguish between these different categories.
HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection
We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.
Comparative Studies of Detecting Abusive Language on Twitter
However, this dataset has not been comprehensively studied to its potential.
Deep Learning Models for Multilingual Hate Speech Detection
Hate speech detection is a challenging problem with most of the datasets available in only one language: English.
HateCheck: Functional Tests for Hate Speech Detection Models
Detecting online hate is a difficult task that even state-of-the-art models struggle with.
OPT: Open Pre-trained Transformer Language Models
Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.
Hate Speech Dataset from a White Supremacy Forum
Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic.
Hateminers : Detecting Hate speech against Women
With the online proliferation of hate speech, there is an urgent need for systems that can detect such harmful content.
A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media
To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers).