TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Text Summarization	arXiv Summarization Dataset	DeepPyramidion	ROUGE-1	47.15	# 2
Text Summarization	arXiv Summarization Dataset	DeepPyramidion	ROUGE-2	19.99	# 2
Text Summarization	arXiv Summarization Dataset	Blockwise (baseline)	ROUGE-1	46.85	# 3
Text Summarization	arXiv Summarization Dataset	Blockwise (baseline)	ROUGE-2	19.39	# 3
Text Summarization	Pubmed	DeepPyramidion	ROUGE-1	47.81	# 10
Text Summarization	Pubmed	DeepPyramidion	ROUGE-2	21.14	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparsifying-transformer-models-with/text-summarization-on-arxiv-summarization)](https://paperswithcode.com/sota/text-summarization-on-arxiv-summarization?p=sparsifying-transformer-models-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/sparsifying-transformer-models-with/text-summarization-on-pubmed-1)](https://paperswithcode.com/sota/text-summarization-on-pubmed-1?p=sparsifying-transformer-models-with)`

Sparsifying Transformer Models with Trainable Representation Pooling

ACL 2022 · Michał Pietruszka, Łukasz Borchmann, Łukasz Garncarek ·

We propose a novel method to sparsify attention in the Transformer model by learning to select the most-informative token representations during the training process, thus focusing on the task-specific parts of an input. A reduction of quadratic time and memory complexity to sublinear was achieved due to a robust trainable top-$k$ operator. Our experiments on a challenging long document summarization task show that even our simple baseline performs comparably to the current SOTA, and with trainable pooling, we can retain its top quality, while being $1.8\times$ faster during training, $4.5\times$ faster during inference, and up to $13\times$ more computationally efficient in the decoder.

PDF Abstract ACL 2022 PDF ACL 2022 Abstract

Code

Add Remove Mark official

applicaai/pyramidions official

Tasks

Add Remove

Document Summarization

Summarization

Text Summarization

Datasets

Pubmed arXiv Summarization Dataset

Results from the Paper

Edit

Ranked #2 on Text Summarization on arXiv Summarization Dataset

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Text Summarization	arXiv Summarization Dataset	DeepPyramidion	ROUGE-1	47.15	# 2	Compare
Text Summarization	arXiv Summarization Dataset	DeepPyramidion	ROUGE-2	19.99	# 2	Compare
Text Summarization	arXiv Summarization Dataset	Blockwise (baseline)	ROUGE-1	46.85	# 3	Compare
Text Summarization	arXiv Summarization Dataset	Blockwise (baseline)	ROUGE-2	19.39	# 3	Compare
Text Summarization	Pubmed	DeepPyramidion	ROUGE-1	47.81	# 10	Compare
Text Summarization	Pubmed	DeepPyramidion	ROUGE-2	21.14	# 9	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Attention Dropout • BPE • Cosine Annealing • Dense Connections • Dropout • GELU • Label Smoothing • Layer Normalization • Linear Layer • Linear Warmup With Cosine Annealing • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Sparse Transformer • Transformer • Weight Decay

Edit Social Preview

Sparsifying Transformer Models with Trainable Representation Pooling

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove