Being able to predict the length of a scientific paper may be helpful in numerous situations.
Our results indicate that parallel convolutions of filter lengths up to three are usually enough for capturing relevant text features.
Instead of relying on human participants for scoring or labeling the text samples, we propose to automate the process by using a human likeliness metric we define and a discrimination procedure based on large pretrained language models with their probability distributions.
Automatic evaluation of various text quality criteria produced by data-driven intelligent methods is very common and useful because it is cheap, fast, and usually yields repeatable results.
Recent developments in sequence-to-sequence learning with neural networks have considerably improved the quality of automatically generated text summaries and document keywords, stipulating the need for even bigger training corpora.
Using data-driven models for solving text summarization or similar tasks has become very common in the last years.
Most of the proposed supervised and unsupervised methods for keyphrase generation are unable to produce terms that are valuable but do not appear in the text.
This work investigates the role of factors like training method, training corpus size and thematic relevance of texts in the performance of word embedding features on sentiment analysis of tweets, song lyrics, movie reviews and item reviews.
Also cold-start and data sparsity are the two traditional and top problems being addressed in 23 and 22 studies each, while movies and movie datasets are still widely used by most of the authors.
In the area of online communication, commerce and transactions, analyzing sentiment polarity of texts written in various natural languages has become crucial.
Second, there are various uncertainties regarding the use of word embedding vectors: should they be generated from the same data set that is used to train the model or it is better to source them from big and popular collections?