TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	HMDB-51	MSNet-R50 (16 frames, ImageNet pretrained)	Average accuracy of 3 splits	77.4	# 31
Action Classification	Kinetics-400	MSNet-R50 (16 frames, ImageNet pretrained)	Acc@1	76.4	# 141
Action Recognition	Something-Something V1	MSNet-R50 (8 frames, ImageNet pretrained)	Top 1 Accuracy	50.9	# 48
Action Recognition	Something-Something V1	MSNet-R50 (8 frames, ImageNet pretrained)	Top 5 Accuracy	80.3	# 25
Action Recognition	Something-Something V1	MSNet-R50En (ensemble)	Top 1 Accuracy	55.1	# 27
Video Classification	Something-Something V1	MSNet-R50En (ours)	Top-5 Accuracy	84	# 1
Action Recognition	Something-Something V1	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top 1 Accuracy	54.4	# 30
Action Recognition	Something-Something V1	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top 5 Accuracy	83.8	# 13
Action Recognition	Something-Something V1	MSNet-R50 (16 frames, ImageNet pretrained)	Top 1 Accuracy	52.1	# 43
Action Recognition	Something-Something V1	MSNet-R50 (16 frames, ImageNet pretrained)	Top 5 Accuracy	82.3	# 18
Action Recognition	Something-Something V2	MSNet-R50 (16 frames, ImageNet pretrained)	Top-1 Accuracy	64.7	# 92
Action Recognition	Something-Something V2	MSNet-R50 (16 frames, ImageNet pretrained)	Top-5 Accuracy	89.4	# 70
Action Recognition	Something-Something V2	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top-1 Accuracy	66.6	# 75
Action Recognition	Something-Something V2	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top-5 Accuracy	90.6	# 51
Video Classification	Something-Something V2	MSNet-R50En (ours)	Top-5 Accuracy	91	# 1
Action Recognition	Something-Something V2	MSNet-R50 (8 frames, ImageNet pretrained)	Top-1 Accuracy	63	# 99
Action Recognition	Something-Something V2	MSNet-R50 (8 frames, ImageNet pretrained)	Top-5 Accuracy	88.4	# 78

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionsqueeze-neural-motion-feature-learning/video-classification-on-something-something)](https://paperswithcode.com/sota/video-classification-on-something-something?p=motionsqueeze-neural-motion-feature-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionsqueeze-neural-motion-feature-learning/video-classification-on-something-something-1)](https://paperswithcode.com/sota/video-classification-on-something-something-1?p=motionsqueeze-neural-motion-feature-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionsqueeze-neural-motion-feature-learning/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=motionsqueeze-neural-motion-feature-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionsqueeze-neural-motion-feature-learning/action-recognition-in-videos-on-hmdb-51)](https://paperswithcode.com/sota/action-recognition-in-videos-on-hmdb-51?p=motionsqueeze-neural-motion-feature-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionsqueeze-neural-motion-feature-learning/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=motionsqueeze-neural-motion-feature-learning)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/motionsqueeze-neural-motion-feature-learning/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=motionsqueeze-neural-motion-feature-learning)`

MotionSqueeze: Neural Motion Feature Learning for Video Understanding

ECCV 2020 · Heeseung Kwon, Manjin Kim, Suha Kwak, Minsu Cho ·

Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information typically using optical flows extracted by a separate off-the-shelf method. As the frame-by-frame optical flows require heavy computation, incorporating motion information has remained a major computational bottleneck for video understanding. In this work, we replace external and heavy computation of optical flows with internal and light-weight learning of motion features. We propose a trainable neural module, dubbed MotionSqueeze, for effective motion feature extraction. Inserted in the middle of any neural network, it learns to establish correspondences across frames and convert them into motion features, which are readily fed to the next downstream layer for better prediction. We demonstrate that the proposed method provides a significant gain on four standard benchmarks for action recognition with only a small amount of additional cost, outperforming the state of the art on Something-Something-V1&V2 datasets.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Code

Add Remove Mark official

arunos728/MotionSqueeze

132

arunos728/arunos728.github.io

Tasks

Add Remove

Action Classification

Action Recognition

Video Classification

Video Understanding

Datasets

Kinetics

HMDB51

Kinetics 400

Something-Something V2

Something-Something V1

Results from the Paper

Edit

Ranked #1 on Video Classification on Something-Something V2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	HMDB-51	MSNet-R50 (16 frames, ImageNet pretrained)	Average accuracy of 3 splits	77.4	# 31	Compare
Action Classification	Kinetics-400	MSNet-R50 (16 frames, ImageNet pretrained)	Acc@1	76.4	# 141	Compare
Action Recognition	Something-Something V1	MSNet-R50 (8 frames, ImageNet pretrained)	Top 1 Accuracy	50.9	# 48	Compare
Action Recognition	Something-Something V1	MSNet-R50 (8 frames, ImageNet pretrained)	Top 5 Accuracy	80.3	# 25	Compare
Action Recognition	Something-Something V1	MSNet-R50En (ensemble)	Top 1 Accuracy	55.1	# 27	Compare
Video Classification	Something-Something V1	MSNet-R50En (ours)	Top-5 Accuracy	84	# 1	Compare
Action Recognition	Something-Something V1	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top 1 Accuracy	54.4	# 30	Compare
Action Recognition	Something-Something V1	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top 5 Accuracy	83.8	# 13	Compare
Action Recognition	Something-Something V1	MSNet-R50 (16 frames, ImageNet pretrained)	Top 1 Accuracy	52.1	# 43	Compare
Action Recognition	Something-Something V1	MSNet-R50 (16 frames, ImageNet pretrained)	Top 5 Accuracy	82.3	# 18	Compare
Action Recognition	Something-Something V2	MSNet-R50 (16 frames, ImageNet pretrained)	Top-1 Accuracy	64.7	# 92	Compare
Action Recognition	Something-Something V2	MSNet-R50 (16 frames, ImageNet pretrained)	Top-5 Accuracy	89.4	# 70	Compare
Action Recognition	Something-Something V2	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top-1 Accuracy	66.6	# 75	Compare
Action Recognition	Something-Something V2	MSNet-R50En (8+16 ensemble, ImageNet pretrained)	Top-5 Accuracy	90.6	# 51	Compare
Video Classification	Something-Something V2	MSNet-R50En (ours)	Top-5 Accuracy	91	# 1	Compare
Action Recognition	Something-Something V2	MSNet-R50 (8 frames, ImageNet pretrained)	Top-1 Accuracy	63	# 99	Compare
Action Recognition	Something-Something V2	MSNet-R50 (8 frames, ImageNet pretrained)	Top-5 Accuracy	88.4	# 78	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

MotionSqueeze: Neural Motion Feature Learning for Video Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove