Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

124,889

thudm/swissarmytransformer

2 papers

842

HHousen/TransformerSum

2 papers

424

shashiongithub/XSum

2 papers

338

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Latest papers

Most implemented Social Latest No code

Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

mirzaalimm/truncationvssummarization • 19 Mar 2024

In this study, we investigate the performance of document truncation and summarization in text classification tasks.

19 Mar 2024

Paper
Code

Detect-Order-Construct: A Tree Construction based Approach for Hierarchical Document Structure Analysis

microsoft/comphrdoc • 22 Jan 2024

Our end-to-end system achieves state-of-the-art performance on two large-scale document layout analysis datasets (PubLayNet and DocLayNet), a high-quality hierarchical document structure reconstruction dataset (HRDoc), and our Comp-HRDoc benchmark.

22 Jan 2024

Paper
Code

Shaping Political Discourse using multi-source News Summarization

c-rajan/Multi-source-NewsSummarization • • 18 Dec 2023

Multi-document summarization is the process of automatically generating a concise summary of multiple documents related to the same topic.

18 Dec 2023

Paper
Code

OpenAsp: A Benchmark for Multi-document Open Aspect-based Summarization

liatschiff/openasp • 7 Dec 2023

To advance research on more realistic scenarios, we introduce OpenAsp, a benchmark for multi-document \textit{open} aspect-based summarization.

07 Dec 2023

Paper
Code

Supervising the Centroid Baseline for Extractive Multi-Document Summarization

priberam/cera-summ • • 29 Nov 2023

The centroid method is a simple approach for extractive multi-document summarization and many improvements to its pipeline have been proposed.

29 Nov 2023

Paper
Code

Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

salesforce/diversesumm • 17 Sep 2023

In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event.

17 Sep 2023

Paper
Code

ODSum: New Benchmarks for Open Domain Multi-Document Summarization

yale-nlp/odsum • 16 Sep 2023

Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries.

16 Sep 2023

Paper
Code

Gender Bias in News Summarization: Measures, Pitfalls and Corpora

julmaxi/summary_bias • 14 Sep 2023

Summarization is an important application of large language models (LLMs).

14 Sep 2023

Paper
Code

Extending Context Window of Large Language Models via Positional Interpolation

pku-yuangroup/open-sora-plan • • 27 Jun 2023

We present Position Interpolation (PI) that extends the context window sizes of RoPE-based pretrained LLMs such as LLaMA models to up to 32768 with minimal fine-tuning (within 1000 steps), while demonstrating strong empirical results on various tasks that require long context, including passkey retrieval, language modeling, and long document summarization from LLaMA 7B to 65B.

10,064

27 Jun 2023

Paper
Code

Pre-training Meets Clustering: A Hybrid Extractive Multi-document Summarization Model

Akankshakarotia/Pre-training-meets-Clustering-A-Hybrid-Extractive-Multi-Document-Summarization-Model • International Conference on Hybrid Intelligent Systems 2023

Outcomes validate that our proposed model shows greatly enhanced performance as compared to the existent unsupervised state-of-the-art approaches.

25 May 2023

Paper
Code

Document Summarization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result