Timely and effective response to humanitarian crises requires quick and accurate analysis of large amounts of text data, a process that can highly benefit from expert-assisted NLP systems trained on validated and annotated data in the humanitarian response domain. To enable creation of such NLP systems, we introduce and release HumSet, a novel and rich multilingual dataset of humanitarian response documents annotated by experts in the humanitarian response community. The dataset provides documents in three languages (English, French, Spanish) and covers a variety of humanitarian crises from 2018 to 2021 across the globe. For each document, HumSet provides selected snippets (entries) as well as assigned classes to each entry annotated using common humanitarian information analysis frameworks. HumSet also provides novel and challenging entry extraction and multi-label entry classification tasks. In this paper, we take a first step towards approaching these tasks and conduct a set of expe
2 PAPERS • NO BENCHMARKS YET
TuPyE, an enhanced iteration of TuPy, encompasses a compilation of 43,668 meticulously annotated documents specifically selected for the purpose of hate speech detection within diverse social network contexts. This augmented dataset integrates supplementary annotations and amalgamates with datasets sourced from Fortuna et al. (2019), Leite et al. (2020), and Vargas et al. (2022), complemented by an infusion of 10,000 original documents from the TuPy-Dataset.
1 PAPER • NO BENCHMARKS YET