Document Summarization

195 papers with code • 7 benchmarks • 28 datasets

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

Add a Result

These leaderboards are used to track progress in Document Summarization

Dataset	Best Model	Compare
CNN / Daily Mail	Scrambled code + broken (alter)	See all
HowSumm-Step	LexRank (query: step title)	See all
HowSumm-Method	LexRank (query: method + article + steps titles)	See all
BBC XSum	BigBird-Pegasus	See all
Arxiv HEP-TH citation graph	DeepPyramidion	See all
arXiv Summarization Dataset	DeepPyramidion	See all
WikiLingua (tr->en)	DOCmT5	See all

Libraries

Use these libraries to find Document Summarization models and implementations

huggingface/transformers

3 papers

125,425

thudm/swissarmytransformer

2 papers

843

HHousen/TransformerSum

2 papers

425

shashiongithub/XSum

2 papers

343

See all 6 libraries.

Datasets

Subtasks

Email Thread Summarization

Latest papers with no code

Most implemented Social Latest No code

Non-Parametric Memory Guidance for Multi-Document Summarization

no code yet • 14 Nov 2023

Multi-document summarization (MDS) is a difficult task in Natural Language Processing, aiming to summarize information from several documents.

Paper
Add Code

Mitigating Framing Bias with Polarity Minimization Loss

no code yet • 3 Nov 2023

Framing bias plays a significant role in exacerbating political polarization by distorting the perception of actual events.

Paper
Add Code

Abstractive Summarization of Large Document Collections Using GPT

no code yet • 9 Oct 2023

This paper proposes a method of abstractive summarization designed to scale to document collections instead of individual documents.

Paper
Add Code

Controllable Multi-document Summarization: Coverage & Coherence Intuitive Policy with Large Language Model Based Rewards

no code yet • 5 Oct 2023

Memory-efficient large language models are good at refining text input for better readability.

Paper
Add Code

LLM Based Multi-Document Summarization Exploiting Main-Event Biased Monotone Submodular Content Extraction

no code yet • 5 Oct 2023

Multi-document summarization is a challenging task due to its inherent subjective bias, highlighted by the low inter-annotator ROUGE-1 score of 0. 4 among DUC-2004 reference summaries.

Paper
Add Code

Finding Pragmatic Differences Between Disciplines

no code yet • NAACL (sdp) 2021

Using a corpus of scholarly documents across 19 disciplines and state-of-the-art language modeling techniques, we learn a fixed set of domain-agnostic descriptors for document sections and "retrofit" the corpus to these descriptors (also referred to as "normalization").

Paper
Add Code

Multi-document Summarization: A Comparative Evaluation

no code yet • 10 Sep 2023

This work serves as a reference for future MDS research and contributes to the development of accurate and robust models which can be utilized on demanding datasets with academically and/or scientifically complex data as well as generalized, relatively simple datasets.

Paper
Add Code

Unsupervised Multi-document Summarization with Holistic Inference

no code yet • 8 Sep 2023

SRI balances the importance and diversity of a subset of sentences from the source documents and can be calculated in unsupervised and adaptive manners.

Paper
Add Code

A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation

no code yet • 8 Aug 2023

In this paper, we compare several recent sentence embedding methods via time-series of semantic similarity between successive sentences and matrices of pairwise sentence similarity for multiple books of literature.

Paper
Add Code

A Personalized Reinforcement Learning Summarization Service for Learning Structure from Unstructured Data

no code yet • 9 Jul 2023

The exponential growth of textual data has created a crucial need for tools that assist users in extracting meaningful insights.

Paper
Add Code

Document Summarization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result