The HateBR dataset is a significant resource for studying offensive language and hate speech detection in Brazilian Portuguese. Here are the key details about this dataset:

  1. Collection and Annotation:
  2. The HateBR dataset was collected from Brazilian Instagram comments related to politicians.
  3. It was manually annotated by specialists who carefully labeled each comment.
  4. The dataset consists of 7,000 documents.

  5. Annotation Layers:

  6. The HateBR dataset includes annotations at three different levels:

    • Binary Classification: Comments are labeled as either offensive or non-offensive.
    • Offensiveness Levels: Comments are categorized as highly, moderately, or slightly offensive.
    • Hate Speech Targets: Comments are further classified into nine specific hate speech categories:
    • Xenophobia
    • Racism
    • Homophobia
    • Sexism
    • Religious intolerance
    • Partyism
    • Apology for the dictatorship
    • Antisemitism
    • Fatphobia
  7. Inter-Annotator Agreement:

  8. Each comment was annotated by three different annotators to ensure reliability.
  9. The dataset achieved high inter-annotator agreement.

  10. Baseline Performance:

  11. Baseline experiments using machine learning models achieved an F1-score of 85%, outperforming existing baselines for Portuguese language hate speech datasets.

  12. Corpus and Models:

  13. The HateBR dataset includes a corpus of annotated comments.
  14. The repository contains the best models presented in the associated research paper.

  15. File Format:

  16. The HateBr.csv file provides four columns:
    • 1st column: Instagram comments.
    • 2nd column: Offensive language classification (offensive vs. non-offensive).
    • 3rd column: Offensiveness level (highly, moderately, slightly offensive).
    • 4th column: Hate speech classification (nine different targets).

Source: Conversation with Bing, 3/16/2024 (1) HateBR - Offensive Language and Hate Speech Dataset in ... - GitHub. https://github.com/franciellevargas/HateBR. (2) ruanchaves/hatebr · Datasets at Hugging Face. https://huggingface.co/datasets/ruanchaves/hatebr. (3) Papers with Code - HateBR: Large expert annotated corpus of Brazilian .... https://paperswithcode.com/paper/hatebr-large-expert-annotated-corpus-of.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages