Dialogue Evaluation

48 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Dialogue Evaluation

Trend	Dataset	Best Model	Paper	Code	Compare
	USR-TopicalChat	MDD-Eval			See all
	USR-PersonaChat	Lin-Reg (all)			See all

Datasets

Latest papers with no code

Most implemented Social Latest No code

PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

no code yet • 18 Dec 2022

To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters.

Paper
Add Code

Dialogue Evaluation with Offline Reinforcement Learning

no code yet • SIGDIAL (ACL) 2022

They are ideally evaluated with human users, which however is unattainable to do at every iteration of the development phase.

Paper
Add Code

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

no code yet • 19 Jun 2022

Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.

Paper
Add Code

AdaCoach: A Virtual Coach for Training Customer Service Agents

no code yet • 27 Apr 2022

With the development of online business, customer service agents gradually play a crucial role as an interface between the companies and their customers.

Paper
Add Code

Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

no code yet • 18 Mar 2022

This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.

Paper
Add Code

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

no code yet • 14 Feb 2022

Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.

Paper
Add Code

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

no code yet • NLP4ConvAI (ACL) 2022

At the heart of improving conversational AI is the open problem of how to evaluate conversations.

Paper
Add Code

User Response and Sentiment Prediction for Automatic Dialogue Evaluation

no code yet • 16 Nov 2021

Automatic evaluation is beneficial for open-domain dialog system development.

Paper
Add Code

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

no code yet • 5 Oct 2021

Yet, the impact of different Pr-LMs on the performance of automatic metrics is not well-understood.

Paper
Add Code

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

no code yet • ACL ARR September 2021

Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost.

Paper
Add Code

Dialogue Evaluation

Benchmarks Add a Result

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result