4 dataset results for Dialogue Evaluation AND Texts AND English

FaithDial is a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark.

12 PAPERS • NO BENCHMARKS YET

USR-PersonaChat

This dataset was collected with the goal of assessing dialog evaluation metrics. In the paper, USR: An Unsupervised and Reference Free Evaluation Metric for Dialog (Mehri and Eskenazi, 2020), the authors collect this data to measure the quality of several existing word-overlap and embedding-based metrics, as well as their newly proposed USR metric.

7 PAPERS • 1 BENCHMARK

USR-TopicalChat

7 PAPERS • 1 BENCHMARK

Reddit Engagement Dataset

Reddit Engagement Dataset (RED), a distant-supervision set, with 80k single-turn conversations. RED is sourced from Reddit, sampling from 43 popular subreddits, and processed from a total of 5 million posts, filtering out data that was either non-conversational, toxic, or posts not possible to ascertain popularity.

1 PAPER • NO BENCHMARKS YET

Datasets

4 dataset results for Dialogue Evaluation AND Texts AND English