HopeEDI: A Multilingual Hope Speech Detection Dataset for Equality, Diversity, and Inclusion

1 Dec 2020  ·  Bharathi Raja Chakravarthi ·

Over the past few years, systems have been developed to control online content and eliminate abusive, offensive or hate speech content. However, people in power sometimes misuse this form of censorship to obstruct the democratic right of freedom of speech. Therefore, it is imperative that research should take a positive reinforcement approach towards online content that is encouraging, positive and supportive contents. Until now, most studies have focused on solving this problem of negativity in the English language, though the problem is much more than just harmful content. Furthermore, it is multilingual as well. Thus, we have constructed a Hope Speech dataset for Equality, Diversity and Inclusion (HopeEDI) containing user-generated comments from the social media platform YouTube with 28,451, 20,198 and 10,705 comments in English, Tamil and Malayalam, respectively, manually labelled as containing hope speech or not. To our knowledge, this is the first research of its kind to annotate hope speech for equality, diversity and inclusion in a multilingual setting. We determined that the inter-annotator agreement of our dataset using Krippendorff’s alpha. Further, we created several baselines to benchmark the resulting dataset and the results have been expressed using precision, recall and F1-score. The dataset is publicly available for the research community. We hope that this resource will spur further research on encouraging inclusive and responsive speech that reinforces positiveness.

PDF Abstract

Datasets


Introduced in the Paper:

HopeEDI
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Hope Speech Detection HopeEDI Decision Tree Classifier Weighted Average F1-score 0.90 # 2
Hope Speech Detection for Tamil HopeEDI Logistic Regression Weighted Average F1-score 0.56 # 2
Hope Speech Detection for Malayalam HopeEDI Decision Tree Classifier Weighted Average F1-score 0.73 # 2
Hope Speech Detection for English HopeEDI Decision Tree Classifier Weighted Average F1-score 0.90 # 2

Methods


No methods listed for this paper. Add relevant methods here