Dialogue Evaluation

48 papers with code • 2 benchmarks • 6 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Dialogue Evaluation

Trend	Dataset	Best Model	Paper	Code	Compare
	USR-TopicalChat	MDD-Eval			See all
	USR-PersonaChat	Lin-Reg (all)			See all

Datasets

Latest papers

Most implemented Social Latest No code

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

e0397123/comp-analysis • 24 Dec 2023

Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc.

24 Dec 2023

Paper
Code

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

e0397123/xdial-eval • • 13 Oct 2023

The English dialogue data are extended to nine other languages with commercial machine translation systems.

13 Oct 2023

Paper
Code

Towards Multilingual Automatic Dialogue Evaluation

johndmendonca/dialevalml • • 31 Aug 2023

The main limiting factor in the development of robust multilingual dialogue evaluation metrics is the lack of multilingual data and the limited availability of open sourced multilingual dialogue systems.

31 Aug 2023

Paper
Code

Simple LLM Prompting is State-of-the-Art for Robust and Multilingual Dialogue Evaluation

johndmendonca/dialevalml • • 31 Aug 2023

Despite significant research effort in the development of automatic dialogue evaluation metrics, little thought is given to evaluating dialogues other than in English.

31 Aug 2023

Paper
Code

C-PMI: Conditional Pointwise Mutual Information for Turn-level Dialogue Evaluation

renll/c-pmi • 27 Jun 2023

Existing reference-free turn-level evaluation metrics for chatbots inadequately capture the interaction between the user and the system.

27 Jun 2023

Paper
Code

DEnsity: Open-domain Dialogue Evaluation Metric using Density Estimation

ddehun/density • • 8 May 2023

Despite the recent advances in open-domain dialogue systems, building a reliable evaluation metric is still a challenging problem.

08 May 2023

Paper
Code

GLM-Dialog: Noise-tolerant Pre-training for Knowledge-grounded Dialogue Generation

ruckbreasoning/glm-dialog • • 28 Feb 2023

We present GLM-Dialog, a large-scale language model (LLM) with 10B parameters capable of knowledge-grounded conversation in Chinese using a search engine to access the Internet knowledge.

28 Feb 2023

Paper
Code