TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	EPIC-KITCHENS-100	TAdaFormer-L/14	Action@1	51.8	# 3
Action Recognition	EPIC-KITCHENS-100	TAdaFormer-L/14	Verb@1	71.7	# 6
Action Recognition	EPIC-KITCHENS-100	TAdaFormer-L/14	Noun@1	64.1	# 3
Action Recognition	EPIC-KITCHENS-100	TAdaConvNeXtV2-S	Action@1	48.9	# 8
Action Recognition	EPIC-KITCHENS-100	TAdaConvNeXtV2-S	Verb@1	71.0	# 8
Action Recognition	EPIC-KITCHENS-100	TAdaConvNeXtV2-S	Noun@1	60.2	# 10
Action Classification	Kinetics-400	TAdaFormer-L/14	Acc@1	89.9	# 11
Action Classification	Kinetics-400	TAdaConvNeXtV2-B	Acc@1	86.4	# 41
Action Recognition	Something-Something V1	TAdaConvNeXtV2-B	Top 1 Accuracy	60.7	# 9
Action Recognition	Something-Something V1	TAdaFormer-L/14	Top 1 Accuracy	63.7	# 5
Action Recognition	Something-Something V2	TAdaConvNeXtV2-B	Top-1 Accuracy	71.1	# 31
Action Recognition	Something-Something V2	TAdaFormer-L/14	Top-1 Accuracy	73.6	# 19

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporally-adaptive-models-for-efficient/action-recognition-on-epic-kitchens-100)](https://paperswithcode.com/sota/action-recognition-on-epic-kitchens-100?p=temporally-adaptive-models-for-efficient)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporally-adaptive-models-for-efficient/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=temporally-adaptive-models-for-efficient)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporally-adaptive-models-for-efficient/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=temporally-adaptive-models-for-efficient)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporally-adaptive-models-for-efficient/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=temporally-adaptive-models-for-efficient)`

Temporally-Adaptive Models for Efficient Video Understanding

10 Aug 2023 · Ziyuan Huang, Shiwei Zhang, Liang Pan, Zhiwu Qing, Yingya Zhang, Ziwei Liu, Marcelo H. Ang Jr ·

Spatial convolutions are extensively used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, i.e., using shared weights for every location in different frames. This work presents Temporally-Adaptive Convolutions (TAdaConv) for video understanding, which shows that adaptive weight calibration along the temporal dimension is an efficient way to facilitate modeling complex temporal dynamics in videos. Specifically, TAdaConv empowers spatial convolutions with temporal modeling abilities by calibrating the convolution weights for each frame according to its local and global temporal context. Compared to existing operations for temporal modeling, TAdaConv is more efficient as it operates over the convolution kernels instead of the features, whose dimension is an order of magnitude smaller than the spatial resolutions. Further, kernel calibration brings an increased model capacity. Based on this readily plug-in operation TAdaConv as well as its extension, i.e., TAdaConvV2, we construct TAdaBlocks to empower ConvNeXt and Vision Transformer to have strong temporal modeling capabilities. Empirical results show TAdaConvNeXtV2 and TAdaFormer perform competitively against state-of-the-art convolutional and Transformer-based models in various video understanding benchmarks. Our codes and models are released at: https://github.com/alibaba-mmai-research/TAdaConv.

PDF Abstract

Code

Add Remove Mark official

alibaba-mmai-research/TAdaConv official

215

Tasks

Add Remove

Action Classification

Action Recognition

Video Understanding

Datasets

UCF101

Kinetics

HMDB51

Kinetics 400

Something-Something V2

EPIC-KITCHENS-100

Something-Something V1

HACS

Results from the Paper

Edit

Ranked #3 on Action Recognition on EPIC-KITCHENS-100 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	EPIC-KITCHENS-100	TAdaFormer-L/14	Action@1	51.8	# 3	Compare
			Verb@1	71.7	# 6	Compare
			Noun@1	64.1	# 3	Compare
Action Recognition	EPIC-KITCHENS-100	TAdaConvNeXtV2-S	Action@1	48.9	# 8	Compare
			Verb@1	71.0	# 8	Compare
			Noun@1	60.2	# 10	Compare
Action Classification	Kinetics-400	TAdaFormer-L/14	Acc@1	89.9	# 11	Compare
Action Classification	Kinetics-400	TAdaConvNeXtV2-B	Acc@1	86.4	# 41	Compare
Action Recognition	Something-Something V1	TAdaConvNeXtV2-B	Top 1 Accuracy	60.7	# 9	Compare
Action Recognition	Something-Something V1	TAdaFormer-L/14	Top 1 Accuracy	63.7	# 5	Compare
Action Recognition	Something-Something V2	TAdaConvNeXtV2-B	Top-1 Accuracy	71.1	# 31	Compare
Action Recognition	Something-Something V2	TAdaFormer-L/14	Top-1 Accuracy	73.6	# 19	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • ConvNeXt • Convolution • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer • Vision Transformer

Edit Social Preview

Temporally-Adaptive Models for Efficient Video Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove