Sentiment Classification Using Document Embeddings Trained with Cosine Similarity

ACL 2019  ·  Tan Thongtan, Tanasanee Phienthrakul ·

In document-level sentiment classification, each document must be mapped to a fixed length vector. Document embedding models map each document to a dense, low-dimensional vector in continuous vector space. This paper proposes training document embeddings using cosine similarity instead of dot product. Experiments on the IMDB dataset show that accuracy is improved when using cosine similarity compared to using dot product, while using feature combination with Naive Bayes weighted bag of n-grams achieves a new state of the art accuracy of 97.42{\%}. Code to reproduce all experiments is available at https://github.com/tanthongtan/dv-cosine

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Sentiment Analysis IMDb DV-ngrams-cosine Accuracy 93.13 # 26

Methods


No methods listed for this paper. Add relevant methods here