Multi-News, consists of news articles and human-written summaries of these articles from the site newser.com. Each summary is professionally written by editors and includes links to the original articles cited.
102 PAPERS • 4 BENCHMARKS
The DUC2004 dataset is a dataset for document summarization. Is designed and used for testing only. It consists of 500 news articles, each paired with four human written summaries. Specifically it consists of 50 clusters of Text REtrieval Conference (TREC) documents, from the following collections: AP newswire, 1998-2000; New York Times newswire, 1998-2000; Xinhua News Agency (English version), 1996-2000. Each cluster contained on average 10 documents.
15 PAPERS • 4 BENCHMARKS
OPOSUM is a dataset for the training and evaluation of Opinion Summarization models which contains Amazon reviews from six product domains: Laptop Bags, Bluetooth Headsets, Boots, Keyboards, Televisions, and Vacuums. The six training collections were created by downsampling from the Amazon Product Dataset introduced in McAuley et al. (2015) and contain reviews and their respective ratings.
8 PAPERS • NO BENCHMARKS YET
Healthline is a nutrition related dataset for multi-document summarization, using scientific studies.
3 PAPERS • NO BENCHMARKS YET
MS^2 (Multi-Document Summarization of Medical Studies) is a dataset of over 470k documents and 20k summaries derived from the scientific literature. This dataset facilitates the development of systems that can assess and aggregate contradictory evidence across multiple studies, and is one of the first large-scale, publicly available multi-document summarization dataset in the biomedical domain.
3 PAPERS • 1 BENCHMARK
OpenAsp Dataset OpenAsp is an Open Aspect-based Multi-Document Summarization dataset derived from DUC and MultiNews summarization datasets.
2 PAPERS • NO BENCHMARKS YET
GameWikiSum is a domain-specific (video game) dataset for multi-document summarization, which is one hundred times larger than commonly used datasets, and in another domain than news. Input documents consist of long professional video game reviews as well as references of their gameplay sections in Wikipedia pages.
1 PAPER • NO BENCHMARKS YET