TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D ResNet-18)	Top-1 Accuracy	34.5	# 40
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D ResNet-18)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D ResNet-18)	Frozen	false	# 1
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D Resnet-34)	Top-1 Accuracy	35.7	# 39
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D Resnet-34)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D Resnet-34)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	DPC (Modified 3D Resnet-34)	3-fold Accuracy	75.7	# 33
Self-Supervised Action Recognition	UCF101	DPC (Modified 3D Resnet-34)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	DPC (Modified 3D Resnet-34)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18, Split 1)	3-fold Accuracy	60.6	# 46
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18, Split 1)	Pre-Training Dataset	UCF101	# 1
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18, Split 1)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18)	3-fold Accuracy	68.2	# 39
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18)	Frozen	false	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-representation-learning-by-dense/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=video-representation-learning-by-dense)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/video-representation-learning-by-dense/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=video-representation-learning-by-dense)`

Video Representation Learning by Dense Predictive Coding

10 Sep 2019 · Tengda Han, Weidi Xie, Andrew Zisserman ·

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition. We make three contributions: First, we introduce the Dense Predictive Coding (DPC) framework for self-supervised representation learning on videos. This learns a dense encoding of spatio-temporal blocks by recurrently predicting future representations; Second, we propose a curriculum training scheme to predict further into the future with progressively less temporal context. This encourages the model to only encode slowly varying spatial-temporal signals, therefore leading to semantic representations; Third, we evaluate the approach by first training the DPC model on the Kinetics-400 dataset with self-supervised learning, and then finetuning the representation on a downstream task, i.e. action recognition. With single stream (RGB only), DPC pretrained representations achieve state-of-the-art self-supervised performance on both UCF101(75.7% top1 acc) and HMDB51(35.7% top1 acc), outperforming all previous learning methods by a significant margin, and approaching the performance of a baseline pre-trained on ImageNet.

PDF Abstract

Code

Add Remove Mark official

TengdaHan/DPC official

251

Tasks

Add Remove

Action Recognition

Representation Learning

Self-Supervised Action Recognition

Self-Supervised Learning

Temporal Action Localization

Datasets

UCF101

Kinetics

HMDB51

Kinetics 400

Results from the Paper

Edit

Ranked #33 on Self-Supervised Action Recognition on UCF101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D ResNet-18)	Top-1 Accuracy	34.5	# 40	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	HMDB51	DPC (Modified 3D Resnet-34)	Top-1 Accuracy	35.7	# 39	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	DPC (Modified 3D Resnet-34)	3-fold Accuracy	75.7	# 33	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18, Split 1)	3-fold Accuracy	60.6	# 46	Compare
			Pre-Training Dataset	UCF101	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	DPC (3D ResNet-18)	3-fold Accuracy	68.2	# 39	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Video Representation Learning by Dense Predictive Coding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove