TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Detection	Charades	MS-TCT (RGB only)	mAP	25.4	# 7
Temporal Action Localization	MultiTHUMOS	MS-TCT	Average mAP	16.2	# 6
Action Detection	Multi-THUMOS	MS-TCT (RGB only)	mAP	43.1	# 5
Action Detection	TSU	MS-TCT	Frame-mAP	33.7	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ms-tct-multi-scale-temporal-convtransformer/action-detection-on-tsu)](https://paperswithcode.com/sota/action-detection-on-tsu?p=ms-tct-multi-scale-temporal-convtransformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ms-tct-multi-scale-temporal-convtransformer/action-detection-on-multi-thumos)](https://paperswithcode.com/sota/action-detection-on-multi-thumos?p=ms-tct-multi-scale-temporal-convtransformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ms-tct-multi-scale-temporal-convtransformer/temporal-action-localization-on-multithumos-1)](https://paperswithcode.com/sota/temporal-action-localization-on-multithumos-1?p=ms-tct-multi-scale-temporal-convtransformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ms-tct-multi-scale-temporal-convtransformer/action-detection-on-charades)](https://paperswithcode.com/sota/action-detection-on-charades?p=ms-tct-multi-scale-temporal-convtransformer)`

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

CVPR 2022 · Rui Dai, Srijan Das, Kumara Kahatapitiya, Michael S. Ryoo, Francois Bremond ·

Action detection is an essential and challenging task, especially for densely labelled datasets of untrimmed videos. The temporal relation is complex in those datasets, including challenges like composite action, and co-occurring action. For detecting actions in those complex videos, efficiently capturing both short-term and long-term temporal information in the video is critical. To this end, we propose a novel ConvTransformer network for action detection. This network comprises three main components: (1) Temporal Encoder module extensively explores global and local temporal relations at multiple temporal resolutions. (2) Temporal Scale Mixer module effectively fuses the multi-scale features to have a unified feature representation. (3) Classification module is used to learn the instance center-relative position and predict the frame-level classification scores. The extensive experiments on multiple datasets, including Charades, TSU and MultiTHUMOS, confirm the effectiveness of our proposed method. Our network outperforms the state-of-the-art methods on all three datasets.

PDF Abstract CVPR 2022 PDF CVPR 2022 Abstract

Code

Add Remove Mark official

dairui01/MS-TCT

Tasks

Add Remove

Action Detection

Temporal Action Localization

Datasets

Charades

MultiTHUMOS

Toyota Smarthome Dataset

TSU

Results from the Paper

Edit

Ranked #2 on Action Detection on TSU

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Detection	Charades	MS-TCT (RGB only)	mAP	25.4	# 7	Compare
Temporal Action Localization	MultiTHUMOS	MS-TCT	Average mAP	16.2	# 6	Compare
Action Detection	Multi-THUMOS	MS-TCT (RGB only)	mAP	43.1	# 5	Compare
Action Detection	TSU	MS-TCT	Frame-mAP	33.7	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MS-TCT: Multi-Scale Temporal ConvTransformer for Action Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove