Semantics and Homothetic Clustering of Hafez Poetry
We have created two sets of labels for Hafez (1315-1390) poems, using unsupervised learning. Our labels are the only semantic clustering alternative to the previously existing, hand-labeled, gold-standard classification of Hafez poems, to be used for literary research. We have cross-referenced, measured and analyzed the agreements of our clustering labels with Houman{'}s chronological classes. Our features are based on topic modeling and word embeddings. We also introduced a similarity of similarities{'} features, we called homothetic clustering approach that proved effective, in case of Hafez{'}s small corpus of ghazals2. Although all our experiments showed different clusters when compared with Houman{'}s classes, we think they were valid in their own right to have provided further insights, and have proved useful as a contrasting alternative to Houman{'}s classes. Our homothetic clusterer and its feature design and engineering framework can be used for further semantic analysis of Hafez{'}s poetry and other similar literary research.
PDF Abstract