Dialogue Evaluation
48 papers with code • 2 benchmarks • 6 datasets
Latest papers with no code
PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment
To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters.
Dialogue Evaluation with Offline Reinforcement Learning
They are ideally evaluated with human users, which however is unattainable to do at every iteration of the development phase.
MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue
Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.
AdaCoach: A Virtual Coach for Training Customer Service Agents
With the development of online business, customer service agents gradually play a crucial role as an interface between the companies and their customers.
Report from the NSF Future Directions Workshop on Automatic Evaluation of Dialog: Research Directions and Challenges
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
FlowEval: A Consensus-Based Dialogue Evaluation Framework Using Segment Act Flows
Hence, we propose segment act, an extension of dialog act from utterance level to segment level, and crowdsource a large-scale dataset for it.
Human Evaluation of Conversations is an Open Problem: comparing the sensitivity of various methods for evaluating dialogue agents
At the heart of improving conversational AI is the open problem of how to evaluate conversations.
User Response and Sentiment Prediction for Automatic Dialogue Evaluation
Automatic evaluation is beneficial for open-domain dialog system development.
Investigating the Impact of Pre-trained Language Models on Dialog Evaluation
Yet, the impact of different Pr-LMs on the performance of automatic metrics is not well-understood.
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost.