TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Online Action Detection	THUMOS'14	OadTR	mAP	64.2	# 11
Online Action Detection	THUMOS'14	OadTR	MFLOPs per pred	2513.5	# 1
Online Action Detection	THUMOS'14	CoOadTR-b1	MFLOPs per pred	10.6	# 6
Online Action Detection	THUMOS'14	CoOadTR-b2	mAP	64.4	# 10
Online Action Detection	THUMOS'14	CoOadTR-b2	MFLOPs per pred	411.9	# 4
Online Action Detection	THUMOS'14	OadTR-b1	mAP	63.9	# 12
Online Action Detection	THUMOS'14	OadTR-b1	MFLOPs per pred	673	# 3
Online Action Detection	THUMOS'14	OadTR-b2	mAP	64.5	# 9
Online Action Detection	THUMOS'14	OadTR-b2	MFLOPs per pred	1075.7	# 2
Online Action Detection	TVSeries	OadTR-b2	mCAP	88.3	# 5
Online Action Detection	TVSeries	CoOadTR-b1	mCAP	87.7	# 8
Online Action Detection	TVSeries	CoOadTR-b2	mCAP	87.6	# 9
Online Action Detection	TVSeries	OadTR-b1	mCAP	88.1	# 6
Online Action Detection	TVSeries	OadTR	mCAP	88.6	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continual-transformers-redundancy-free/online-action-detection-on-tvseries)](https://paperswithcode.com/sota/online-action-detection-on-tvseries?p=continual-transformers-redundancy-free)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/continual-transformers-redundancy-free/online-action-detection-on-thumos-14)](https://paperswithcode.com/sota/online-action-detection-on-thumos-14?p=continual-transformers-redundancy-free)`

Continual Transformers: Redundancy-Free Attention for Online Inference

17 Jan 2022 · Lukas Hedegaard, Arian Bakhtiarnia, Alexandros Iosifidis ·

Transformers in their common form are inherently limited to operate on whole token sequences rather than on one token at a time. Consequently, their use during online inference on time-series data entails considerable redundancy due to the overlap in successive token sequences. In this work, we propose novel formulations of the Scaled Dot-Product Attention, which enable Transformers to perform efficient online token-by-token inference on a continual input stream. Importantly, our modifications are purely to the order of computations, while the outputs and learned weights are identical to those of the original Transformer Encoder. We validate our Continual Transformer Encoder with experiments on the THUMOS14, TVSeries and GTZAN datasets with remarkable results: Our Continual one- and two-block architectures reduce the floating point operations per prediction by up to 63x and 2.6x, respectively, while retaining predictive performance.

PDF Abstract

Code

Add Remove Mark official

lukashedegaard/continual-transforme… official

Tasks

Add Remove

Action Detection

Audio Classification

Classification

Online Action Detection

Time Series

Time Series Analysis

Datasets

ActivityNet

THUMOS14 TVSeries

Results from the Paper

Edit

Ranked #4 on Online Action Detection on TVSeries

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Online Action Detection	THUMOS'14	OadTR	mAP	64.2	# 11	Compare
Online Action Detection	THUMOS'14	OadTR	MFLOPs per pred	2513.5	# 1	Compare
Online Action Detection	THUMOS'14	CoOadTR-b1	MFLOPs per pred	10.6	# 6	Compare
Online Action Detection	THUMOS'14	CoOadTR-b2	mAP	64.4	# 10	Compare
Online Action Detection	THUMOS'14	CoOadTR-b2	MFLOPs per pred	411.9	# 4	Compare
Online Action Detection	THUMOS'14	OadTR-b1	mAP	63.9	# 12	Compare
Online Action Detection	THUMOS'14	OadTR-b1	MFLOPs per pred	673	# 3	Compare
Online Action Detection	THUMOS'14	OadTR-b2	mAP	64.5	# 9	Compare
Online Action Detection	THUMOS'14	OadTR-b2	MFLOPs per pred	1075.7	# 2	Compare
Online Action Detection	TVSeries	OadTR-b2	mCAP	88.3	# 5	Compare
Online Action Detection	TVSeries	CoOadTR-b1	mCAP	87.7	# 8	Compare
Online Action Detection	TVSeries	CoOadTR-b2	mCAP	87.6	# 9	Compare
Online Action Detection	TVSeries	OadTR-b1	mCAP	88.1	# 6	Compare
Online Action Detection	TVSeries	OadTR	mCAP	88.6	# 4	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dot-Product Attention • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Continual Transformers: Redundancy-Free Attention for Online Inference

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove