FaithDial is a new benchmark for hallucination-free dialogues, by editing hallucinated responses in the Wizard of Wikipedia (WoW) benchmark.
12 PAPERS • NO BENCHMARKS YET
This dataset was collected with the goal of assessing dialog evaluation metrics. In the paper, USR: An Unsupervised and Reference Free Evaluation Metric for Dialog (Mehri and Eskenazi, 2020), the authors collect this data to measure the quality of several existing word-overlap and embedding-based metrics, as well as their newly proposed USR metric.
7 PAPERS • 1 BENCHMARK
Reddit Engagement Dataset (RED), a distant-supervision set, with 80k single-turn conversations. RED is sourced from Reddit, sampling from 43 popular subreddits, and processed from a total of 5 million posts, filtering out data that was either non-conversational, toxic, or posts not possible to ascertain popularity.
1 PAPER • NO BENCHMARKS YET