The DialSummEval is a multi-faceted dataset of human judgments. It was created to revisit the evaluation of dialogue summarization models. The dataset contains the outputs of 14 models on SAMSum, a dialogue summary dataset.

The creators of DialSummEval observed that current dialogue summarization models have flaws that may not be well exposed by frequently used metrics such as ROUGE. Therefore, they re-evaluated 18 categories of metrics in terms of four dimensions: coherence, consistency, fluency, and relevance. They also conducted a unified human evaluation of various models in dialogue summarization for the first time.

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages