Dialogue Evaluation

39 papers with code • 2 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Adversarial Learning for Neural Dialogue Generation

liuyuemaicha/Adversarial-Learning-for-Neural-Dialogue-Generation-in-Tensorflow EMNLP 2017

In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances.

Approximating Interactive Human Evaluation with Self-Play for Open-Domain Dialog Systems

natashamjaques/neural_chat NeurIPS 2019

To investigate the strengths of this novel metric and interactive evaluation in comparison to state-of-the-art metrics and human evaluation of static conversations, we perform extended experiments with a set of models, including several that make novel improvements to recent hierarchical dialog generation architectures through sentiment and semantic knowledge distillation on the utterance level.

Investigating Evaluation of Open-Domain Dialogue Systems With Human Generated Multiple References

prakharguptaz/multirefeval WS 2019

The aim of this paper is to mitigate the shortcomings of automatic evaluation of open-domain dialog systems through multi-reference evaluation.

Predictive Engagement: An Efficient Metric For Automatic Evaluation of Open-Domain Dialogue Systems

SarikGhazarian/PredictiveEngagement 4 Nov 2019

In this paper, we investigate the possibility and efficacy of estimating utterance-level engagement and define a novel metric, {\em predictive engagement}, for automatic evaluation of open-domain dialogue systems.

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

lfdharo/DSTC10_Track5_Toxicity 3 Nov 2021

The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.

RUBER: An Unsupervised Method for Automatic Evaluation of Open-Domain Dialog Systems

thu-coai/OpenMEVA 11 Jan 2017

Open-domain human-computer conversation has been attracting increasing attention over the past few years.

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

mike-n-7/ADEM ACL 2017

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.

Evaluating Coherence in Dialogue Systems using Entailment

nouhadziri/DialogEntailment NAACL 2019

Evaluating open-domain dialogue systems is difficult due to the diversity of possible correct answers.

Towards Best Experiment Design for Evaluating Dialogue System Output

sashank06/INLG_eval WS 2019

To overcome the limitations of automated metrics (e. g. BLEU, METEOR) for evaluating dialogue systems, researchers typically use human judgments to provide convergent evidence.

PONE: A Novel Automatic Evaluation Metric for Open-Domain Generative Dialogue Systems

gmftbyGMFTBY/PONE 6 Apr 2020

Through extensive experiments, the learning-based metrics are demonstrated that they are the most effective evaluation metrics for open-domain generative dialogue systems.