Sentiment analysis is one of the most popular natural language processing tasks. In this paper we introduce pre-trained Russian language models which are used to extract embeddings (ELMo) to improve accuracy for classification of short conversational texts... The first language model was trained on Russian Twitter dataset containing 102 million sentences, while two others were trained on 57.5 million sentences of Russian News and 23.9 million sentences of Russian Wikipedia articles. Although classifiers trained on top of language models perform better than in the case of utilizing of fastText embeddings of the same language style, we show that domain of language model also has a significant impact on accuracy. This paper establishes state-of-the-art results for RuSentiment dataset improving weighted F1-score from 72.8 to 78.5. All our models are available online as well as the source code which allows everyone to apply them or fine-tune on domain-specific data. read more

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here