TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Recognition	HMDB-51	Prob-Distill	Average accuracy of 3 splits	72.0	# 48
Action Recognition	Something-Something V2	Prob-Distill	Top-1 Accuracy	49.9	# 116
Action Recognition	Something-Something V2	Prob-Distill	Top-5 Accuracy	79.1	# 85
Action Recognition	UCF101	Prob-Distill	3-fold Accuracy	95.7	# 39

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/paying-more-attention-to-motion-attention/action-recognition-in-videos-on-ucf101)](https://paperswithcode.com/sota/action-recognition-in-videos-on-ucf101?p=paying-more-attention-to-motion-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/paying-more-attention-to-motion-attention/action-recognition-in-videos-on-hmdb-51)](https://paperswithcode.com/sota/action-recognition-in-videos-on-hmdb-51?p=paying-more-attention-to-motion-attention)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/paying-more-attention-to-motion-attention/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=paying-more-attention-to-motion-attention)`

Attention Distillation for Learning Video Representations

5 Apr 2019 · Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg ·

We address the challenging problem of learning motion representations using deep models for video recognition. To this end, we make use of attention modules that learn to highlight regions in the video and aggregate features for recognition. Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network. We systematically study the design of attention modules, and develop a novel method for attention distillation. Our method is evaluated on major action benchmarks, and consistently improves the performance of the baseline RGB network by a significant margin. Moreover, we demonstrate that our attention maps can leverage motion cues in learning to identify the location of actions in video frames. We believe our method provides a step towards learning motion-aware representations in deep models. Our project page is available at https://aptx4869lm.github.io/AttentionDistillation/

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Recognition

Video Recognition

Datasets

UCF101

Kinetics

HMDB51

Something-Something V2

EGTEA

Results from the Paper

Edit

Ranked #39 on Action Recognition on UCF101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Recognition	HMDB-51	Prob-Distill	Average accuracy of 3 splits	72.0	# 48	Compare
Action Recognition	Something-Something V2	Prob-Distill	Top-1 Accuracy	49.9	# 116	Compare
Action Recognition	Something-Something V2	Prob-Distill	Top-5 Accuracy	79.1	# 85	Compare
Action Recognition	UCF101	Prob-Distill	3-fold Accuracy	95.7	# 39	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Attention Distillation for Learning Video Representations

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove