High probability or low information? The probability–quality paradox in language generation
When generating natural language from neural probabilistic models, high probability does not always coincide with high quality. Rather, mode-seeking decoding methods can lead to incredibly unnatural language, while stochastic methods produce text perceived as much more human-like. In this note, we offer an explanation for this phenomenon by analyzing language as a means of communication in the information-theoretic sense. We posit that human-like language usually contains an expected amount of information—quantified as negative log-probability—and that language with substantially more (or less) information is undesirable. We provide preliminary empirical evidence for this hypothesis using quality ratings for both human and machine-generated text, covering multiple tasks and common decoding schemes.
PDF Abstract