Large-scale manually-annotated corpus for 1,000 scientific papers (on computational linguistics) for automatic summarization. Summaries for each paper are constructed from the papers that cite that paper and from that paper's abstract. Source: ScisummNet: A Large Annotated Corpus and Content-Impact Models for Scientific Paper Summarization with Citation Networks
17 PAPERS • NO BENCHMARKS YET
A scholarly data set with publications’ full-text, annotated in-text citations, and links to metadata.
8 PAPERS • NO BENCHMARKS YET
The TalkSumm dataset contains 1705 automatically-generated summaries of scientific papers from ACL, NAACL, EMNLP, SIGDIAL (2015-2018), and ICML (2017-2018).
6 PAPERS • NO BENCHMARKS YET
FacetSum is a faceted summarization dataset for scientific documents. FacetSum has been built on Emerald journal articles, covering a diverse range of domains. Different from traditional document-summary pairs, FacetSum provides multiple summaries, each targeted at specific sections of a long document, including the purpose, method, findings, and value.
3 PAPERS • 1 BENCHMARK
MS^2 (Multi-Document Summarization of Medical Studies) is a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is one of the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
1 PAPER • 2 BENCHMARKS