Dialogue Evaluation

48 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Dialogue Evaluation

Trend	Dataset	Best Model	Paper	Code	Compare
	USR-TopicalChat	MDD-Eval			See all
	USR-PersonaChat	Lin-Reg (all)			See all

Datasets

Latest papers

Most implemented Social Latest No code

SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation

royny/self-eval • • COLING 2022

This paper introduces a novel Self-supervised Fine-grained Dialogue Evaluation framework (SelF-Eval).

17 Aug 2022

Paper
Code

Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian

dialogue-evaluation/ruatd • • 3 Jun 2022

The first task is framed as a binary classification problem.

03 Jun 2022

Paper
Code

InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning

prakharguptaz/Instructdial • 25 May 2022

We introduce InstructDial, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets.

25 May 2022

Paper
Code

RuNNE-2022 Shared Task: Recognizing Nested Named Entities

dialogue-evaluation/runne • • 23 May 2022

In the test set the frequency of all entity types is even.

23 May 2022

Paper
Code

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation

alexa/conture • Findings (ACL) 2022

Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect.

25 Mar 2022

Paper
Code

DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations

pluslabnlp/deam • ACL 2022

We also show that DEAM can distinguish between coherent and incoherent dialogues generated by baseline manipulations, whereas those baseline models cannot detect incoherent examples generated by DEAM.

18 Mar 2022

Paper
Code

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

tianboji/dialogue-eval • ACL 2022

Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost.

11 Mar 2022

Paper
Code

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

e0397123/mdd-eval • • 14 Dec 2021

Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations.

14 Dec 2021

Paper
Code

Automatic Evaluation and Moderation of Open-domain Dialogue Systems

e0397123/dstc10_metric_track • • 3 Nov 2021

The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.

03 Nov 2021

Paper
Code

A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues

repozhang/case_hmceval • • ACL 2021

HMCEval casts dialogue evaluation as a sample assignment problem, where we need to decide to assign a sample to a human or a machine for evaluation.

01 Aug 2021

Paper
Code

Dialogue Evaluation

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result