TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Classification	Kinetics-400	CT-Net Ensemble	Acc@1	79.8	# 98
Action Recognition	Something-Something V1	CT-Net Ensemble (R50, 8+12+16+24)	Top 1 Accuracy	56.6	# 18
Action Recognition	Something-Something V2	CT-Net Ensemble (R50, 8+12+16+24)	Top-1 Accuracy	67.8	# 56
Action Recognition	Something-Something V2	CT-Net Ensemble (R50, 8+12+16+24)	Top-5 Accuracy	91.1	# 43
Action Recognition	Something-Something V2	CT-Net Ensemble (R50, 8+12+16+24)	Parameters	83.8	# 28
Action Recognition	Something-Something V2	CT-Net Ensemble (R50, 8+12+16+24)	GFLOPs	280	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ct-net-channel-tensorization-network-for-1/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=ct-net-channel-tensorization-network-for-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ct-net-channel-tensorization-network-for-1/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=ct-net-channel-tensorization-network-for-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ct-net-channel-tensorization-network-for-1/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=ct-net-channel-tensorization-network-for-1)`

CT-Net: Channel Tensorization Network for Video Classification

ICLR 2021 · Kunchang Li, Xianhang Li, Yali Wang, Jun Wang, Yu Qiao ·

3D convolution is powerful for video classification but often computationally expensive, recent studies mainly focus on decomposing it on spatial-temporal and/or channel dimensions. Unfortunately, most approaches fail to achieve a preferable balance between convolutional efficiency and feature-interaction sufficiency. For this reason, we propose a concise and novel Channel Tensorization Network (CT-Net), by treating the channel dimension of input feature as a multiplication of K sub-dimensions. On one hand, it naturally factorizes convolution in a multiple dimension way, leading to a light computation burden. On the other hand, it can effectively enhance feature interaction from different channels, and progressively enlarge the 3D receptive field of such interaction to boost classification accuracy. Furthermore, we equip our CT-Module with a Tensor Excitation (TE) mechanism. It can learn to exploit spatial, temporal and channel attention in a high-dimensional manner, to improve the cooperative power of all the feature dimensions in our CT-Module. Finally, we flexibly adapt ResNet as our CT-Net. Extensive experiments are conducted on several challenging video benchmarks, e.g., Kinetics-400, Something-Something V1 and V2. Our CT-Net outperforms a number of recent SOTA approaches, in terms of accuracy and/or efficiency. The codes and models will be available on https://github.com/Andy1621/CT-Net.

PDF Abstract ICLR 2021 PDF ICLR 2021 Abstract

Code

Add Remove Mark official

Andy1621/CT-Net official

Tasks

Add Remove

Action Classification

Action Recognition

Classification

Video Classification

Datasets

Kinetics

Kinetics 400

Something-Something V2

Something-Something V1

Results from the Paper

Edit

Ranked #18 on Action Recognition on Something-Something V1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Classification	Kinetics-400	CT-Net Ensemble	Acc@1	79.8	# 98	Compare
Action Recognition	Something-Something V1	CT-Net Ensemble (R50, 8+12+16+24)	Top 1 Accuracy	56.6	# 18	Compare
Action Recognition	Something-Something V2	CT-Net Ensemble (R50, 8+12+16+24)	Top-1 Accuracy	67.8	# 56	Compare
			Top-5 Accuracy	91.1	# 43	Compare
			Parameters	83.8	# 28	Compare
			GFLOPs	280	# 3	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

CT-Net: Channel Tensorization Network for Video Classification

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove