Uniform Information Density Effects on Syntactic Choice in Hindi

According to the UNIFORM INFORMATION DENSITY (UID) hypothesis (Levy and Jaeger, 2007; Jaeger, 2010), speakers tend to distribute information density across the signal uniformly while producing language. The prior works cited above studied syntactic reduction in language production at particular choice points in a sentence. In contrast, we use a variant of the above UID hypothesis in order to investigate the extent to which word order choices in Hindi are influenced by the drive to minimize the variance of information across entire sentences. To this end, we propose multiple lexical and syntactic measures (at both word and constituent levels) to capture the uniform spread of information across a sentence. Subsequently, we incorporate these measures in machine learning models aimed to distinguish between a naturally occurring corpus sentence and its grammatical variants (expressing the same idea). Our results indicate that our UID measures are not a significant factor in predicting the corpus sentence in the presence of lexical surprisal, a competing control predictor. Finally, in the light of other recent works, we conclude with a discussion of reasons for UID not being suitable for a theory of word order.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here