Evaluating Dialogs based on Grice's Maxims
There is no agreed upon standard for the evaluation of conversational dialog systems, which are well-known to be hard to evaluate due to the difficulty in pinning down metrics that will correspond to human judgements and the subjective nature of human judgment itself. We explored the possibility of using Grice{'}s Maxims to evaluate effective communication in conversation. We collected some system generated dialogs from popular conversational chatbots across the spectrum and conducted a survey to see how the human judgements based on Gricean maxims correlate, and if such human judgments can be used as an effective evaluation metric for conversational dialog.
PDF Abstract