Sentiment Analysis for Multilingual Corpora

WS 2019 · Svitlana Galeshchuk, Ju Qiu, Julien Jourdan ·

The paper presents a generic approach to the supervised sentiment analysis of social media content in Slavic languages. The method proposes translating the documents from the original language to English with Google{'}s Neural Translation Model. The resulted texts are then converted to vectors by averaging the vectorial representation of words derived from a pre-trained Word2Vec English model. Testing the approach with several machine learning methods on Polish, Slovenian and Croatian Twitter datasets returns up to 86{\%} of classification accuracy on the out-of-sample data.

PDF Abstract