TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Action Recognition	HMDB51	Motion & Appearance (C3D)	Top-1 Accuracy	20.3	# 47
Self-Supervised Action Recognition	HMDB51	Motion & Appearance (C3D)	Pre-Training Dataset	UCF101	# 1
Self-Supervised Action Recognition	HMDB51	Motion & Appearance (C3D)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	Motion & Appearance (C3D)	3-fold Accuracy	58.8	# 49
Self-Supervised Action Recognition	UCF101	Motion & Appearance (C3D)	Pre-Training Dataset	UCF101	# 1
Self-Supervised Action Recognition	UCF101	Motion & Appearance (C3D)	Frozen	false	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-supervised-spatio-temporal/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=self-supervised-spatio-temporal)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-supervised-spatio-temporal/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=self-supervised-spatio-temporal)`

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

CVPR 2019 · Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yun-hui Liu, Wei Liu ·

We address the problem of video representation learning without human-annotated labels. While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic tasks where spatio-temporal features are prevailing. In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. Inspired by the success of two-stream approaches in video classification, we propose to learn visual features by regressing both motion and appearance statistics along spatial and temporal dimensions, given only the input video data. Specifically, we extract statistical concepts (fast-motion region and the corresponding dominant direction, spatio-temporal color diversity, dominant color, etc.) from simple patterns in both spatial and temporal domains. Unlike prior puzzles that are even hard for humans to solve, the proposed approach is consistent with human inherent visual habits and therefore easy to answer. We conduct extensive experiments with C3D to validate the effectiveness of our proposed approach. The experiments show that our approach can significantly improve the performance of C3D when applied to video classification tasks. Code is available at https://github.com/laura-wang/video_repres_mas.

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Code

Add Remove Mark official

laura-wang/video_repres_mas official

Tasks

Add Remove

Action Recognition

General Classification

Representation Learning

Self-Supervised Action Recognition

Video Classification

Datasets

UCF101

Kinetics

HMDB51

Kinetics 400

Results from the Paper

Edit

Ranked #47 on Self-Supervised Action Recognition on HMDB51

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Action Recognition	HMDB51	Motion & Appearance (C3D)	Top-1 Accuracy	20.3	# 47	Compare
			Pre-Training Dataset	UCF101	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	Motion & Appearance (C3D)	3-fold Accuracy	58.8	# 49	Compare
			Pre-Training Dataset	UCF101	# 1	Compare
			Frozen	false	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove