Paper

Variation of word frequencies in Russian literary texts

We study the variation of word frequencies in Russian literary texts. Our findings indicate that the standard deviation of a word's frequency across texts depends on its average frequency according to a power law with exponent $0.62,$ showing that the rarer words have a relatively larger degree of frequency volatility (i.e., "burstiness"). Several latent factors models have been estimated to investigate the structure of the word frequency distribution. The dependence of a word's frequency volatility on its average frequency can be explained by the asymmetry in the distribution of latent factors.

Results in Papers With Code
(↓ scroll down to see all results)