TuPyE-Dataset (Portuguese Hate Speech Expanded Dataset)

Introduced by Oliveira et al. in TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

TuPyE, an enhanced iteration of TuPy, encompasses a compilation of 43,668 meticulously annotated documents specifically selected for the purpose of hate speech detection within diverse social network contexts. This augmented dataset integrates supplementary annotations and amalgamates with datasets sourced from Fortuna et al. (2019), Leite et al. (2020), and Vargas et al. (2022), complemented by an infusion of 10,000 original documents from the TuPy-Dataset.

In light of the constrained availability of annotated data in Portuguese pertaining to the English language, TuPyE is committed to the expansion and enhancement of existing datasets. This augmentation serves to facilitate the development of advanced hate speech detection models through the utilization of machine learning (ML) and natural language processing (NLP) techniques.

Papers


Paper Code Results Date Stars

Tasks


License


  • cc-by-4.0

Modalities


Languages