TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Temporal Action Localization	ActivityNet-1.3	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.5	61.72	# 2
Temporal Action Localization	ActivityNet-1.3	AdaTAD (VideoMAEv2-giant)	mAP	41.93	# 3
Temporal Action Localization	ActivityNet-1.3	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.75	43.35	# 2
Temporal Action Localization	ActivityNet-1.3	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.95	10.85	# 1
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	Avg mAP (0.1-0.5)	29.3	# 1
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	mAP IOU@0.1	33.1	# 1
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	mAP IOU@0.2	32.2	# 1
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	mAP IOU@0.3	30.4	# 1
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	mAP IOU@0.4	27.5	# 1
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	mAP IOU@0.5	23.1	# 1
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.5	79.4	# 1
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.3	90.1	# 1
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.4	85.9	# 1
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.6	67.6	# 1
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.7	53.8	# 1
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	Avg mAP (0.3:0.7)	75.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/end-to-end-temporal-action-detection-with-1b/temporal-action-localization-on-epic-kitchens)](https://paperswithcode.com/sota/temporal-action-localization-on-epic-kitchens?p=end-to-end-temporal-action-detection-with-1b)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/end-to-end-temporal-action-detection-with-1b/temporal-action-localization-on-thumos14)](https://paperswithcode.com/sota/temporal-action-localization-on-thumos14?p=end-to-end-temporal-action-detection-with-1b)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/end-to-end-temporal-action-detection-with-1b/temporal-action-localization-on-activitynet)](https://paperswithcode.com/sota/temporal-action-localization-on-activitynet?p=end-to-end-temporal-action-detection-with-1b)`

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

28 Nov 2023 · Shuming Liu, Chen-Lin Zhang, Chen Zhao, Bernard Ghanem ·

Recently, temporal action detection (TAD) has seen significant performance improvement with end-to-end training. However, due to the memory bottleneck, only models with limited scales and limited data volumes can afford end-to-end training, which inevitably restricts TAD performance. In this paper, we reduce the memory consumption for end-to-end training, and manage to scale up the TAD backbone to 1 billion parameters and the input video to 1,536 frames, leading to significant detection performance. The key to our approach lies in our proposed temporal-informative adapter (TIA), which is a novel lightweight module that reduces training memory. Using TIA, we free the humongous backbone from learning to adapt to the TAD task by only updating the parameters in TIA. TIA also leads to better TAD representation by temporally aggregating context from adjacent frames throughout the backbone. We evaluate our model across four representative datasets. Owing to our efficient design, we are able to train end-to-end on VideoMAEv2-giant and achieve 75.4% mAP on THUMOS14, being the first end-to-end model to outperform the best feature-based methods. Code is available at https://github.com/sming256/AdaTAD.

PDF Abstract

Code

Add Remove Mark official

sming256/AdaTAD official

sming256/OpenTAD

Tasks

Add Remove

Action Detection

Temporal Action Localization

Datasets

ActivityNet

THUMOS14

EPIC-KITCHENS-100

Ego4D

Results from the Paper

Edit

Ranked #1 on Temporal Action Localization on EPIC-KITCHENS-100

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Temporal Action Localization	ActivityNet-1.3	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.5	61.72	# 2	Compare
			mAP	41.93	# 3	Compare
			mAP IOU@0.75	43.35	# 2	Compare
			mAP IOU@0.95	10.85	# 1	Compare
Temporal Action Localization	EPIC-KITCHENS-100	AdaTAD (verb, VideoMAE-L)	Avg mAP (0.1-0.5)	29.3	# 1	Compare
			mAP IOU@0.1	33.1	# 1	Compare
			mAP IOU@0.2	32.2	# 1	Compare
			mAP IOU@0.3	30.4	# 1	Compare
			mAP IOU@0.4	27.5	# 1	Compare
			mAP IOU@0.5	23.1	# 1	Compare
Temporal Action Localization	THUMOS’14	AdaTAD (VideoMAEv2-giant)	mAP IOU@0.5	79.4	# 1	Compare
			mAP IOU@0.3	90.1	# 1	Compare
			mAP IOU@0.4	85.9	# 1	Compare
			mAP IOU@0.6	67.6	# 1	Compare
			mAP IOU@0.7	53.8	# 1	Compare
			Avg mAP (0.3:0.7)	75.4	# 1	Compare

Methods

Add Remove

Adapter

Edit Social Preview

End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove