Dialogue Evaluation

48 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

e0397123/comp-analysis 24 Dec 2023

Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc.

1
24 Dec 2023

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

e0397123/xdial-eval 13 Oct 2023

The English dialogue data are extended to nine other languages with commercial machine translation systems.

6
13 Oct 2023

Towards Multilingual Automatic Dialogue Evaluation

johndmendonca/dialevalml 31 Aug 2023

The main limiting factor in the development of robust multilingual dialogue evaluation metrics is the lack of multilingual data and the limited availability of open sourced multilingual dialogue systems.

4
31 Aug 2023

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

johndmendonca/dialevalml 31 Aug 2023

Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English.

4
31 Aug 2023

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

renll/c-pmi 27 Jun 2023

Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system.

0
27 Jun 2023

DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation

ddehun/density 8 May 2023

Despite the recent advances in open-domain dialogue systems, building a reliable evaluation metric is still a challenging problem.

9
08 May 2023

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

ruckbreasoning/glm-dialog 28 Feb 2023

We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.

58
28 Feb 2023

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems

emorynlp/chatevaluationplatform 18 Dec 2022

Our method is used to evaluate four state-of-the-art open-domain dialogue systems and compared with existing approaches.

8
18 Dec 2022

FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

e0397123/FineD-Eval 25 Oct 2022

Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment.

11
25 Oct 2022

SelF-Eval: Self-supervised Fine-grained Dialogue Evaluation

royny/self-eval COLING 2022

This paper introduces a novel Self-supervised Fine-grained Dialogue Evaluation framework (SelF-Eval).

3
17 Aug 2022