In this study, we measure the performance of the document classifiers trained using the method of random forests for features generated the three models and their variants.
The multilayer network of language is a unified framework for modeling linguistic subsystems and their structural properties enabling the exploration of their mutual interactions.
This paper presents text normalization which is an integral part of any text-to-speech synthesis system.
Firstly, we show that the triad significance profile for the Croatian language is very similar with the other languages and all the networks belong to the same family of networks.
Obtained sets are evaluated on a manually annotated keywords: for the set of extracted keyword candidates average F1 score is 24, 63%, and average F2 score is 21, 19%; for the exacted words-tuples candidates average F1 score is 25, 9% and average F2 score is 24, 47%.
In this paper we analyse the selectivity measure calculated from the complex network in the task of the automatic keyword extraction.
Finally, since the size of texts is reflected in the network properties, our results suggest that the corpus influence can be reduced by increasing the co-occurrence window size.
Additionally, in the first shuffling approach we preserved the sentence structure of the text and the number of words per sentence.