Dialogue Evaluation

48 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Latest papers with no code

PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

no code yet • 18 Dec 2022

To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters.

Dialogue Evaluation with Offline Reinforcement Learning

no code yet • SIGDIAL (ACL) 2022

They are ideally evaluated with human users, which however is unattainable to do at every iteration of the development phase.

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

no code yet • 19 Jun 2022

Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.

AdaCoach: A Virtual Coach for Training Customer Service Agents

no code yet • 27 Apr 2022

With the development of online business, customer service agents gradually play a crucial role as an interface between the companies and their customers.

Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges

no code yet • 18 Mar 2022

This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.

FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows

no code yet • 14 Feb 2022

Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.

Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents

no code yet • NLP4ConvAI (ACL) 2022

At the heart of improving conversational AI is the open problem of how to evaluate conversations.

User Response and Sentiment Prediction for Automatic Dialogue Evaluation

no code yet • 16 Nov 2021

Automatic evaluation is beneficial for open-domain dialog system development.

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

no code yet • 5 Oct 2021

Yet, the impact of different Pr-LMs on the performance of automatic metrics is not well-understood.

Achieving Reliable Human Assessment of Open-Domain Dialogue Systems

no code yet • ACL ARR September 2021

Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost.