PT Hate Speech

Introduced by Fortuna et al. in A Hierarchically-Labeled Portuguese Hate Speech Dataset

The PT Hate Speech is a valuable resource for studying hate speech in the Portuguese language. Here are the key details about this dataset:

  1. Composition:
  2. The dataset consists of 5,668 tweets written in Portuguese.
  3. Annotators labeled these tweets using two different schemes based on their expertise levels.

  4. Annotation Schemes:

  5. Non-experts initially annotated the tweets using binary labels: either 'hate' or 'no-hate'.
  6. Expert annotators then classified the tweets using a fine-grained hierarchical multiple label scheme. This scheme includes 81 hate speech categories in total.

  7. Hierarchical Annotation Scheme:

  8. The hierarchical approach allows for identifying different types of hate speech and their intersections.
  9. The inter-annotator agreement varied across categories, reflecting the nuanced nature of hate speech perception.

  10. Usefulness and Baseline Experiment:

  11. To demonstrate the dataset's usefulness, a baseline classification experiment was conducted using pre-trained word embeddings and LSTM models.
  12. The results achieved a state-of-the-art outcome.

Source: Conversation with Bing, 3/16/2024 (1) A Hierarchically-Labeled Portuguese Hate Speech Dataset. https://aclanthology.org/W19-3510/. (2) A Hierarchically-Labeled Portuguese Hate Speech Dataset - ACL Anthology. https://aclanthology.org/W19-3510.pdf. (3) A Hierarchically-Labeled Portuguese Hate Speech Dataset. https://paperswithcode.com/paper/a-hierarchically-labeled-portuguese-hate. (4) undefined. https://aclanthology.org/W19-3510.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


License


  • Unknown

Modalities


Languages