Tw-StAR at SemEval-2018 Task 1: Preprocessing Impact on Multi-label Emotion Classification

SEMEVAL 2018 · Hala Mulki, Chedi Bechikh Ali, Hatem Haddad, Ismail Babao{\u{g}}lu ·

In this paper, we describe our contribution in SemEval-2018 contest. We tackled task 1 {``}Affect in Tweets{''}, subtask E-c {``}Detecting Emotions (multi-label classification){''}. A multilabel classification system Tw-StAR was developed to recognize the emotions embedded in Arabic, English and Spanish tweets. To handle the multi-label classification problem via traditional classifiers, we employed the binary relevance transformation strategy while a TF-IDF scheme was used to generate the tweets{'} features. We investigated using single and combinations of several preprocessing tasks to further improve the performance. The results showed that specific combinations of preprocessing tasks could significantly improve the evaluation measures. This has been later emphasized by the official results as our system ranked 3rd for both Arabic and Spanish datasets and 14th for the English dataset.

PDF Abstract