About

Automatic Document Summarization is the task of rewriting a document into its shorter form while still retaining its important content. The most popular two paradigms are extractive approaches and abstractive approaches. Extractive approaches generate summaries by extracting parts of the original document (usually sentences), while abstractive methods may generate new words or phrases which are not in the original document.

Source: HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Datasets

Greatest papers with code

Language Models are Unsupervised Multitask Learners

Preprint 2019 huggingface/transformers

Natural language processing tasks, such as question answering, machine translation, reading comprehension, and summarization, are typically approached with supervised learning on taskspecific datasets.

 Ranked #1 on Language Modelling on enwik8 (using extra training data)

4 COMMON SENSE REASONING DATA-TO-TEXT GENERATION DOCUMENT SUMMARIZATION LANGUAGE MODELLING MACHINE TRANSLATION MULTI-TASK LEARNING QUESTION ANSWERING READING COMPREHENSION

Generating Wikipedia by Summarizing Long Sequences

ICLR 2018 tensorflow/tensor2tensor

We show that generating English Wikipedia articles can be approached as a multi- document summarization of source documents.

DOCUMENT SUMMARIZATION MULTI-DOCUMENT SUMMARIZATION

Text Summarization with Pretrained Encoders

IJCNLP 2019 nlpyang/PreSumm

For abstractive summarization, we propose a new fine-tuning schedule which adopts different optimizers for the encoder and the decoder as a means of alleviating the mismatch between the two (the former is pretrained while the latter is not).

Ranked #2 on Extractive Text Summarization on CNN / Daily Mail (using extra training data)

ABSTRACTIVE TEXT SUMMARIZATION DOCUMENT SUMMARIZATION EXTRACTIVE TEXT SUMMARIZATION

Leveraging Graph to Improve Abstractive Multi-Document Summarization

ACL 2020 PaddlePaddle/Research

Graphs that capture relations between textual units have great benefits for detecting salient information from multiple documents and generating overall coherent summaries.

DOCUMENT SUMMARIZATION MULTI-DOCUMENT SUMMARIZATION

StructSum: Summarization via Structured Representations

1 Mar 2020atulkum/pointer_summarizer

To this end, we propose incorporating latent and explicit dependencies across sentences in the source document into end-to-end single-document summarization models.

ABSTRACTIVE TEXT SUMMARIZATION DOCUMENT SUMMARIZATION

HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization

ACL 2019 abisee/cnn-dailymail

Neural extractive summarization models usually employ a hierarchical encoder for document encoding and they are trained using sentence-level labels, which are created heuristically using rule-based methods.

DOCUMENT SUMMARIZATION EXTRACTIVE TEXT SUMMARIZATION

Extractive Summarization as Text Matching

ACL 2020 maszhongming/MatchSum

This paper creates a paradigm shift with regard to the way we build neural extractive summarization systems.

DOCUMENT SUMMARIZATION EXTRACTIVE TEXT SUMMARIZATION TEXT MATCHING

Ranking Sentences for Extractive Summarization with Reinforcement Learning

NAACL 2018 shashiongithub/Refresh

In this paper we conceptualize extractive summarization as a sentence ranking task and propose a novel training algorithm which globally optimizes the ROUGE evaluation metric through a reinforcement learning objective.

DOCUMENT SUMMARIZATION EXTRACTIVE TEXT SUMMARIZATION