Dialogue Evaluation
48 papers with code • 2 benchmarks • 6 datasets
Latest papers
SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation
This paper introduces a novel Self-supervised Fine-grained Dialogue Evaluation framework (SelF-Eval).
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
The first task is framed as a binary classification problem.
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
We introduce InstructDial, an instruction tuning framework for dialogue, which consists of a repository of 48 diverse dialogue tasks in a unified text-to-text format created from 59 openly available dialogue datasets.
RuNNE-2022 Shared Task: Recognizing Nested Named Entities
In the test set the frequency of all entity types is even.
What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation
Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect.
DEAM: Dialogue Coherence Evaluation using AMR-based Semantic Manipulations
We also show that DEAM can distinguish between coherent and incoherent dialogues generated by baseline manipulations, whereas those baseline models cannot detect incoherent examples generated by DEAM.
Achieving Reliable Human Assessment of Open-Domain Dialogue Systems
Answering the distress call of competitions that have emphasized the urgent need for better evaluation techniques in dialogue, we present the successful development of human evaluation that is highly reliable while still remaining feasible and low cost.
MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation
Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations.
Automatic Evaluation and Moderation of Open-domain Dialogue Systems
The development of Open-Domain Dialogue Systems (ODS)is a trending topic due to the large number of research challenges, large societal and business impact, and advances in the underlying technology.
A Human-machine Collaborative Framework for Evaluating Malevolence in Dialogues
HMCEval casts dialogue evaluation as a sample assignment problem, where we need to decide to assign a sample to a human or a machine for evaluation.