TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Action Recognition	HMDB51	RSPNet	Top-1 Accuracy	64.7	# 16
Self-Supervised Action Recognition	HMDB51	RSPNet	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	HMDB51	RSPNet	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	RSPNet	3-fold Accuracy	93.7	# 11
Self-Supervised Action Recognition	UCF101	RSPNet	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	RSPNet	Frozen	false	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rspnet-relative-speed-perception-for/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=rspnet-relative-speed-perception-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/rspnet-relative-speed-perception-for/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=rspnet-relative-speed-perception-for)`

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

27 Oct 2020 · Peihao Chen, Deng Huang, Dongliang He, Xiang Long, Runhao Zeng, Shilei Wen, Mingkui Tan, Chuang Gan ·

We study unsupervised video representation learning that seeks to learn both motion and appearance features from unlabeled video only, which can be reused for downstream tasks such as action recognition. This task, however, is extremely challenging due to 1) the highly complex spatial-temporal information in videos; and 2) the lack of labeled data for training. Unlike the representation learning for static images, it is difficult to construct a suitable self-supervised task to well model both motion and appearance features. More recently, several attempts have been made to learn video representation through video playback speed prediction. However, it is non-trivial to obtain precise speed labels for the videos. More critically, the learnt models may tend to focus on motion pattern and thus may not learn appearance features well. In this paper, we observe that the relative playback speed is more consistent with motion pattern, and thus provide more effective and stable supervision for representation learning. Therefore, we propose a new way to perceive the playback speed and exploit the relative speed between two video clips as labels. In this way, we are able to well perceive speed and learn better motion features. Moreover, to ensure the learning of appearance features, we further propose an appearance-focused task, where we enforce the model to perceive the appearance difference between two video clips. We show that optimizing the two tasks jointly consistently improves the performance on two downstream tasks, namely action recognition and video retrieval. Remarkably, for action recognition on UCF101 dataset, we achieve 93.7% accuracy without the use of labeled data for pre-training, which outperforms the ImageNet supervised pre-trained model. Code and pre-trained models can be found at https://github.com/PeihaoChen/RSPNet.

PDF Abstract

Code

Add Remove Mark official

PeihaoChen/RSPNet official

Tasks

Add Remove

Action Recognition

Representation Learning

Retrieval

Self-Supervised Action Recognition

Video Retrieval

Datasets

UCF101

HMDB51

Results from the Paper

Edit

Ranked #11 on Self-Supervised Action Recognition on UCF101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Action Recognition	HMDB51	RSPNet	Top-1 Accuracy	64.7	# 16	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	RSPNet	3-fold Accuracy	93.7	# 11	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

RSPNet: Relative Speed Perception for Unsupervised Video Representation Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove