K-MHaS: Korean Multi-label Hate Speech Dataset

Introduced by Lee et al. in K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

Korean Multi-label Hate Speech Dataset

We introduce K-MHaS, a new multi-label dataset for hate speech detection that effectively handles Korean language patterns.

  • consisting of 109,692 utterances from Korean online news comments, labeled with 8 fine-grained hate speech classes.
  • data collection period: between January 2018 and June 2020.

  • providing (a) binary classification and (b) multi-label classification from 1(one) to 4(four) labels.

  • (a) binary classification: Hate Speech or Not Hate Speech
  • (b) fine-grained classification: Politics, Origin, Physical, Age, Gender, Religion, Race, and Profanity.

For the fine-grained classification, a Hate Speech class from the binary classification is broken down into eight classes, associated with the hate speech category.


Paper Code Results Date Stars