BanglaEmotion (BanglaEmotion: A Benchmark Dataset for Bangla Textual Emotion Analysis)

BanglaEmotion is a manually annotated Bangla Emotion corpus, which incorporates the diversity of fine-grained emotion expressions in social-media text. More fine-grained emotion labels are considered such as Sadness, Happiness, Disgust, Surprise, Fear and Anger - which are, according to Paul Ekman (1999), the six basic emotion categories. For this task, a large amount of raw text data are collected from the user’s comments on two different Facebook groups (Ekattor TV and Airport Magistrates) and from the public post of a popular blogger and activist Dr. Imran H Sarker. These comments are mostly reactions to ongoing socio-political issues and towards the economic success and failure of Bangladesh. A total of 32923 comments are scraped from the three sources aforementioned above. Out of these, a total of 6314 comments were annotated into the six categories. The distribution of the annotated corpus is as follows:

sad = 1341 happy = 1908 disgust = 703 surprise = 562 fear = 384 angry = 1416

A balanced set is also provided from the above data and split the dataset into training and test set of equal ratio. A proportion of 5:1 is used for training and evaluation purposes. More information on the dataset and the experiments on it could be found in our paper (related links below).


