Dialogue Evaluation
55 papers with code • 2 benchmarks • 7 datasets
Most implemented papers
Adversarial Learning for Neural Dialogue Generation
In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances.
Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems
Our method is used to evaluate four state-of-the-art open-domain dialogue systems and compared with existing approaches.
Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems
To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.
Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References
The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation.
Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems
In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems.
Unsupervised Evaluation of Interactive Dialog with DialoGPT
It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research.
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment.
RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems
Open-domain human-computer conversation has been attracting increasing attention over the past few years.
Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses
Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.