Multi-Document Summarization

93 papers with code • 5 benchmarks • 15 datasets

Multi-Document Summarization is a process of representing a set of documents with a short piece of text by capturing the relevant information and filtering out the redundant information. Two prominent approaches to Multi-Document Summarization are extractive and abstractive summarization. Extractive summarization systems aim to extract salient snippets, sentences or passages from documents, while abstractive summarization systems aim to concisely paraphrase the content of the documents.

Source: Multi-Document Summarization using Distributed Bag-of-Words Model

Benchmarks

Add a Result

These leaderboards are used to track progress in Multi-Document Summarization

Dataset	Best Model	Compare
Multi-News	PRIMER	See all
DUC 2004	GCN: Personalized Discourse Graph	See all
review	solar	See all
WCEP	PRIMER	See all
MS^2	led-base-16384-ms2	See all

Datasets

Latest papers

Most implemented Social Latest No code

PeerSum: A Peer Review Dataset for Abstractive Multi-document Summarization

oaimli/peersum • • 3 Mar 2022

We present PeerSum, a new MDS dataset using peer reviews of scientific publications.

03 Mar 2022

Paper
Code

Proposition-Level Clustering for Multi-Document Summarization

oriern/procluster • • ACL ARR January 2022

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

16 Jan 2022

Paper
Code

Proposition-Level Clustering for Multi-Document Summarization

oriern/clusterprop • • NAACL 2022

Text clustering methods were traditionally incorporated into multi-document summarization (MDS) as a means for coping with considerable information repetition.

16 Dec 2021

Paper
Code

LongT5: Efficient Text-To-Text Transformer for Long Sequences

google-research/longt5 • • Findings (NAACL) 2022

Recent work has shown that either (1) increasing the input length or (2) increasing model size can improve the performance of Transformer-based neural models.

169

15 Dec 2021

Paper
Code

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

allenai/primer • • ACL ARR November 2021

We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.

148

16 Nov 2021

Paper
Code

SgSum: Transforming Multi-document Summarization into Sub-graph Selection

PaddlePaddle/Research • • 25 Oct 2021

Comparing with traditional methods, our method has two main advantages: (1) the relations between sentences are captured by modeling both the graph structure of the whole document set and the candidate sub-graphs; (2) directly outputs an integrate summary in the form of sub-graph which is more informative and coherent.

1,694

25 Oct 2021

Paper
Code

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

allenai/primer • • ACL 2022

148

16 Oct 2021

Paper
Code

Modeling Endorsement for Multi-Document Abstractive Summarization

ucfnlp/endorser-summ • • EMNLP (newsum) 2021

In this paper, we model the cross-document endorsement effect and its utilization in multiple document summarization.

15 Oct 2021

Paper
Code

HETFORMER: Heterogeneous Transformer with Sparse Attention for Long-Text Extractive Summarization

yeliu918/hetformer • • EMNLP 2021

To capture the semantic graph structure from raw text, most existing summarization approaches are built on GNNs with a pre-trained model.

12 Oct 2021

Paper
Code

Extending Multi-Text Sentence Fusion Resources via Pyramid Annotations

danielabweiss/extending-sentence-fusion-resources • NAACL 2022

NLP models that compare or consolidate information across multiple documents often struggle when challenged with recognizing substantial information redundancies across the texts.

09 Oct 2021

Paper
Code

Multi-Document Summarization

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result