TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Classification	Kinetics-400	MVFNet-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@1	79.1	# 110
Action Classification	Kinetics-400	MVFNet-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@5	93.8	# 83
Action Recognition	Something-Something V1	MVFNet-R50EN	Top 1 Accuracy	54.0	# 33
Action Recognition	Something-Something V2	MVFNet-ResNet50 (center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-1 Accuracy	66.3	# 80

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mvfnet-multi-view-fusion-network-for/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=mvfnet-multi-view-fusion-network-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mvfnet-multi-view-fusion-network-for/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=mvfnet-multi-view-fusion-network-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/mvfnet-multi-view-fusion-network-for/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=mvfnet-multi-view-fusion-network-for)`

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

13 Dec 2020 · Wenhao Wu, Dongliang He, Tianwei Lin, Fu Li, Chuang Gan, Errui Ding ·

Conventionally, spatiotemporal modeling network and its complexity are the two most concentrated research topics in video action recognition. Existing state-of-the-art methods have achieved excellent accuracy regardless of the complexity meanwhile efficient spatiotemporal modeling solutions are slightly inferior in performance. In this paper, we attempt to acquire both efficiency and effectiveness simultaneously. First of all, besides traditionally treating H x W x T video frames as space-time signal (viewing from the Height-Width spatial plane), we propose to also model video from the other two Height-Time and Width-Time planes, to capture the dynamics of video thoroughly. Secondly, our model is designed based on 2D CNN backbones and model complexity is well kept in mind by design. Specifically, we introduce a novel multi-view fusion (MVF) module to exploit video dynamics using separable convolution for efficiency. It is a plug-and-play module and can be inserted into off-the-shelf 2D CNNs to form a simple yet effective model called MVFNet. Moreover, MVFNet can be thought of as a generalized video modeling framework and it can specialize to be existing methods such as C2D, SlowOnly, and TSM under different settings. Extensive experiments are conducted on popular benchmarks (i.e., Something-Something V1 & V2, Kinetics, UCF-101, and HMDB-51) to show its superiority. The proposed MVFNet can achieve state-of-the-art performance with 2D CNN's complexity.

PDF Abstract

Code

Add Remove Mark official

whwu95/MVFNet official

140

whwu95/DSANet

txyugood/PaddleMVF

Tasks

Add Remove

Action Classification

Action Recognition

Temporal Action Localization

Video Recognition

Datasets

ImageNet

UCF101

Kinetics

HMDB51

Kinetics 400

Something-Something V2

Something-Something V1

Results from the Paper

Edit

Ranked #33 on Action Recognition on Something-Something V1

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Classification	Kinetics-400	MVFNet-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@1	79.1	# 110	Compare
Action Classification	Kinetics-400	MVFNet-ResNet101 (ensemble, ImageNet pretrained, RGB only)	Acc@5	93.8	# 83	Compare
Action Recognition	Something-Something V1	MVFNet-R50EN	Top 1 Accuracy	54.0	# 33	Compare
Action Recognition	Something-Something V2	MVFNet-ResNet50 (center crop, 8+16 ensemble, ImageNet pretrained, RGB only)	Top-1 Accuracy	66.3	# 80	Compare

Methods

Add Remove

Convolution

Edit Social Preview

MVFNet: Multi-View Fusion Network for Efficient Video Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove