Hate Speech Detection

124 papers with code • 13 benchmarks • 34 datasets

Hate speech detection is the task of detecting if communication such as text, audio, and so on contains hatred and or encourages violence towards a person or a group of people. This is usually based on prejudice against 'protected characteristics' such as their ethnicity, gender, sexual orientation, religion, age et al. Some example benchmarks are ETHOS and HateXplain. Models can be evaluated with metrics like the F-score or F-measure.

Libraries

Use these libraries to find Hate Speech Detection models and implementations

Most implemented papers

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

huggingface/transformers NeurIPS 2019

As Transfer Learning from large-scale pre-trained models becomes more prevalent in Natural Language Processing (NLP), operating these large models in on-the-edge and/or under constrained computational training or inference budgets remains challenging.

Automated Hate Speech Detection and the Problem of Offensive Language

t-davidson/hate-speech-and-offensive-language 11 Mar 2017

We train a multi-class classifier to distinguish between these different categories.

HateXplain: A Benchmark Dataset for Explainable Hate Speech Detection

punyajoy/HateXplain 18 Dec 2020

We also observe that models, which utilize the human rationales for training, perform better in reducing unintended bias towards target communities.

Comparative Studies of Detecting Abusive Language on Twitter

younggns/comparative-abusive-lang WS 2018

However, this dataset has not been comprehensively studied to its potential.

Deep Learning Models for Multilingual Hate Speech Detection

punyajoy/DE-LIMIT 14 Apr 2020

Hate speech detection is a challenging problem with most of the datasets available in only one language: English.

HateCheck: Functional Tests for Hate Speech Detection Models

paul-rottger/hate-functional-tests ACL 2021

Detecting online hate is a difficult task that even state-of-the-art models struggle with.

OPT: Open Pre-trained Transformer Language Models

facebookresearch/metaseq 2 May 2022

Large language models, which are often trained for hundreds of thousands of compute days, have shown remarkable capabilities for zero- and few-shot learning.

Hate Speech Dataset from a White Supremacy Forum

aitor-garcia-p/hate-speech-dataset WS 2018

Hate speech is commonly defined as any communication that disparages a target group of people based on some characteristic such as race, colour, ethnicity, gender, sexual orientation, nationality, religion, or other characteristic.

Hateminers : Detecting Hate speech against Women

punyajoy/Hateminers-EVALITA 17 Dec 2018

With the online proliferation of hate speech, there is an urgent need for systems that can detect such harmful content.

A BERT-Based Transfer Learning Approach for Hate Speech Detection in Online Social Media

ZeroxTM/BERT-CNN-Fine-Tuning-For-Hate-Speech-Detection-in-Online-Social-Media 28 Oct 2019

To address these needs, in this study we introduce a novel transfer learning approach based on an existing pre-trained language model called BERT (Bidirectional Encoder Representations from Transformers).