The human unlikeness of neural language models in next-word prediction
The training objective of unidirectional language models (LMs) is similar to a psycholinguistic benchmark known as the cloze task, which measures next-word predictability. However, LMs lack the rich set of experiences that people do, and humans can be highly creative. To assess human parity in these models{'} training objective, we compare the predictions of three neural language models to those of human participants in a freely available behavioral dataset (Luke {\&} Christianson, 2016). Our results show that while neural models show a close correspondence to human productions, they nevertheless assign insufficient probability to how often speakers guess upcoming words, especially for open-class content words.
PDF Abstract