Dialogue Evaluation

48 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Adversarial Learning for Neural Dialogue Generation

liuyuemaicha/Adversarial-Learning-for-Neural-Dialogue-Generation-in-Tensorflow EMNLP 2017

In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances.

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

emora-chat/chatevaluationplatform 18 Dec 2022

Our method is used to evaluate four state-of-the-art open-domain dialogue systems and compared with existing approaches.

Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

natashamjaques/neural_chat NeurIPS 2019

To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

prakharguptaz/multirefeval WS 2019

The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation.

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

SarikGhazarian/PredictiveEngagement 4 Nov 2019

In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems.

Unsupervised Evaluation of Interactive Dialog with DialoGPT

shikib/fed SIGDIAL (ACL) 2020

It is important to define meaningful and interpretable automatic evaluation metrics for open-domain dialog research.

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

lfdharo/DSTC10_Track5_Toxicity 3 Nov 2021

The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.

FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

e0397123/FineD-Eval 25 Oct 2022

Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment.

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

thu-coai/OpenMEVA 11 Jan 2017

Open-domain human-computer conversation has been attracting increasing attention over the past few years.

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

mike-n-7/ADEM ACL 2017

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.