TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Classification	Kinetics-400	bLVNet Fan et al. (2019)	Acc@1	73.5	# 161
Action Classification	Kinetics-400	bLVNet Fan et al. (2019)	Acc@5	91.2	# 112
Action Recognition	Something-Something V2	bLVNet	Top-1 Accuracy	65.2	# 89

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/more-is-less-learning-efficient-video-1/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=more-is-less-learning-efficient-video-1)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/more-is-less-learning-efficient-video-1/action-classification-on-kinetics-400)](https://paperswithcode.com/sota/action-classification-on-kinetics-400?p=more-is-less-learning-efficient-video-1)`

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

NeurIPS 2019 · Quanfu Fan, Chun-Fu Chen, Hilde Kuehne, Marco Pistoia, David Cox ·

Current state-of-the-art models for video action recognition are mostly based on expensive 3D ConvNets. This results in a need for large GPU clusters to train and evaluate such architectures. To address this problem, we present a lightweight and memory-friendly architecture for action recognition that performs on par with or better than current architectures by using only a fraction of resources. The proposed architecture is based on a combination of a deep subnet operating on low-resolution frames with a compact subnet operating on high-resolution frames, allowing for high efficiency and accuracy at the same time. We demonstrate that our approach achieves a reduction by $3\sim4$ times in FLOPs and $\sim2$ times in memory usage compared to the baseline. This enables training deeper models with more input frames under the same computational budget. To further obviate the need for large-scale 3D convolutions, a temporal aggregation module is proposed to model temporal dependencies in a video at very small additional computational costs. Our models achieve strong performance on several action recognition benchmarks including Kinetics, Something-Something and Moments-in-time. The code and models are available at https://github.com/IBM/bLVNet-TAM.

PDF Abstract NeurIPS 2019 PDF NeurIPS 2019 Abstract

Code

Add Remove Mark official

IBM/bLVNet-TAM official

Tasks

Add Remove

Action Classification

Action Recognition

Temporal Action Localization

Datasets

Kinetics

Kinetics 400

Something-Something V2

Results from the Paper

Edit

Ranked #89 on Action Recognition on Something-Something V2 (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Classification	Kinetics-400	bLVNet Fan et al. (2019)	Acc@1	73.5	# 161	Compare
Action Classification	Kinetics-400	bLVNet Fan et al. (2019)	Acc@5	91.2	# 112	Compare
Action Recognition	Something-Something V2	bLVNet	Top-1 Accuracy	65.2	# 89	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove