The HateBR dataset is a significant resource for studying offensive language and hate speech detection in Brazilian Portuguese. Here are the key details about this dataset:
The dataset consists of 7,000 documents.
Annotation Layers:
The HateBR dataset includes annotations at three different levels:
Inter-Annotator Agreement:
The dataset achieved high inter-annotator agreement.
Baseline Performance:
Baseline experiments using machine learning models achieved an F1-score of 85%, outperforming existing baselines for Portuguese language hate speech datasets.
Corpus and Models:
The repository contains the best models presented in the associated research paper.
File Format:
HateBr.csv
file provides four columns:Source: Conversation with Bing, 3/16/2024 (1) HateBR - Offensive Language and Hate Speech Dataset in ... - GitHub. https://github.com/franciellevargas/HateBR. (2) ruanchaves/hatebr · Datasets at Hugging Face. https://huggingface.co/datasets/ruanchaves/hatebr. (3) Papers with Code - HateBR: Large expert annotated corpus of Brazilian .... https://paperswithcode.com/paper/hatebr-large-expert-annotated-corpus-of.
Paper | Code | Results | Date | Stars |
---|