EmailSum: Abstractive Email Thread Summarization

Recent years have brought about an interest in the challenging task of summarizing conversation threads (meetings, online discussions, etc.). Such summaries help analysis of the long text to quickly catch up with the decisions made and thus improve our work or communication efficiency. To spur research in thread summarization, we have developed an abstractive Email Thread Summarization (EmailSum) dataset, which contains human-annotated short (<30 words) and long (<100 words) summaries of 2549 email threads (each containing 3 to 10 emails) over a wide variety of topics. We perform a comprehensive empirical study to explore different summarization techniques (including extractive and abstractive methods, single-document and hierarchical models, as well as transfer and semisupervised learning) and conduct human evaluations on both short and long summary generation tasks. Our results reveal the key challenges of current abstractive summarization models in this task, such as understanding the sender's intent and identifying the roles of sender and receiver. Furthermore, we find that widely used automatic evaluation metrics (ROUGE, BERTScore) are weakly correlated with human judgments on this email thread summarization task. Hence, we emphasize the importance of human evaluation and the development of better metrics by the community. Our code and summary data have been made available at: https://github.com/ZhangShiyue/EmailSum

PDF Abstract ACL 2021 PDF ACL 2021 Abstract

Datasets


Introduced in the Paper:

EmailSum

Used in the Paper:

SAMSum CRD3 Avocado research email collection
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Email Thread Summarization EmailSum (long) Oracle ROUGE-1 45.98 # 1
ROUGE-2 15.49 # 1
ROUGE-L 32.4 # 1
RLsum 42.14 # 1
BertS 26.31 # 3
Email Thread Summarization EmailSum (long) T5base ROUGE-1 43.81 # 3
ROUGE-2 14.08 # 2
ROUGE-L 30.47 # 3
RLsum 39.88 # 3
BertS 32.09 # 2
Email Thread Summarization EmailSum (long) SemiSuptogether ROUGE-1 44.08 # 2
ROUGE-2 14.06 # 3
ROUGE-L 31.17 # 2
RLsum 40.67 # 2
BertS 32.3 # 1
Email Thread Summarization EmailSum (short) SemiSuptogether ROUGE-1 36.98 # 2
ROUGE-2 11.21 # 2
ROUGE-L 28.76 # 2
RLsum 33.7 # 2
BertS 33.91 # 1
Email Thread Summarization EmailSum (short) T5base ROUGE-1 36.57 # 3
ROUGE-2 10.56 # 3
ROUGE-L 28.3 # 3
RLsum 32.76 # 3
BertS 33.9 # 2
Email Thread Summarization EmailSum (short) Oracle ROUGE-1 39.04 # 1
ROUGE-2 12.47 # 1
ROUGE-L 30.17 # 1
RLsum 35.61 # 1
BertS 22.32 # 3

Methods