About

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Adversarial Learning for Neural Dialogue Generation

EMNLP 2017 liuyuemaicha/Adversarial-Learning-for-Neural-Dialogue-Generation-in-Tensorflow

In this paper, drawing intuition from the Turing test, we propose using adversarial training for open-domain dialogue generation: the system is trained to produce sequences that are indistinguishable from human-generated dialogue utterances.

DIALOGUE EVALUATION DIALOGUE GENERATION

Towards an Automatic Turing Test: Learning to Evaluate Dialogue Responses

ACL 2017 mike-n-7/ADEM

Automatically evaluating the quality of dialogue responses for unstructured domains is a challenging problem.

DIALOGUE EVALUATION

GRADE: Automatic Graph-Enhanced Coherence Metric for Evaluating Open-Domain Dialogue Systems

EMNLP 2020 li3cmz/GRADE

Capitalized on the topic-level dialogue graph, we propose a new evaluation metric GRADE, which stands for Graph-enhanced Representations for Automatic Dialogue Evaluation.

DIALOGUE EVALUATION

Learning an Unreferenced Metric for Online Dialogue Evaluation

ACL 2020 facebookresearch/online_dialog_eval

Evaluating the quality of a dialogue interaction between two agents is a difficult task, especially in open-domain chit-chat style dialogue.

DIALOGUE EVALUATION

Towards Best Experiment Design for Evaluating Dialogue System Output

WS 2019 sashank06/INLG_eval

To overcome the limitations of automated metrics (e. g. BLEU, METEOR) for evaluating dialogue systems, researchers typically use human judgments to provide convergent evidence.

DIALOGUE EVALUATION

An Adversarially-Learned Turing Test for Dialog Generation Models

16 Apr 2021golsun/AdversarialTuringTest

To alleviate this risk, we propose an adversarial training approach to learn a robust model, ATT (Adversarial Turing Test), that discriminates machine-generated responses from human-written replies.

DIALOGUE EVALUATION