Sentiment Analysis Model for Opinionated Awngi Text: Case of Music Reviews

WS 2019 · Melese Mihret, Muluneh Atinaf ·

Abstract The analysis of sentiments is imperative to make a decision for individuals, organizations, and governments. Due to the rapid growth of Awngi (Agew) text on the web, there is no available corpus annotated for sentiment analysis. In this paper, we present a SA model for the Awngi language spoken in Ethiopia, by using a supervised machine learning approach. We developed our corpus by collecting around 1500 posts from online sources. This research is begun to build and evaluate the model for opinionated Awngi music reviews. Thus, pre-processing techniques have been employed to clean the data, to convert transliterations to the native Ethiopic script for accessibility and convenience to typing and to change the words to their base form by removing the inflectional morphemes. After pre-processing, the corpus is manually annotated by three the language professional for giving polarity, and rate, their level of confidence in their selection and sentiment intensity scale values. To improve the calculation method of feature selection and weighting and proposed a more suitable SA algorithm for feature extraction named CHI and weight calculation named TF IDF, increasing the proportion and weight of sentiment words in the feature words. We employed Support Vector Machines (SVM), Na{\"\i}ve Bayes (NB) and Maximum Entropy (MxEn) machine learning algorithms. Generally, the results are encouraging, despite the morphological challenge in Awngi, the data cleanness and small size of data. We are believed that the results could improve further with a larger corpus.

PDF Abstract