TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-50)	Top-1 Accuracy	36.6	# 38
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-50)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-50)	Frozen	false	# 1
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-18)	Top-1 Accuracy	34.2	# 41
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-18)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-18)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	TCE (ResNet-50)	3-fold Accuracy	71.2	# 37
Self-Supervised Action Recognition	UCF101	TCE (ResNet-50)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	TCE (ResNet-50)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	TCE (ResNet18, Split 1)	3-fold Accuracy	68.2	# 39
Self-Supervised Action Recognition	UCF101	TCE (ResNet18, Split 1)	Pre-Training Dataset	UCF101	# 1
Self-Supervised Action Recognition	UCF101	TCE (ResNet18, Split 1)	Frozen	false	# 1
Self-Supervised Action Recognition	UCF101	TCE (ResNet-18, Split 1)	3-fold Accuracy	68.8	# 38
Self-Supervised Action Recognition	UCF101	TCE (ResNet-18, Split 1)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	TCE (ResNet-18, Split 1)	Frozen	false	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporally-coherent-embeddings-for-self/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=temporally-coherent-embeddings-for-self)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/temporally-coherent-embeddings-for-self/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=temporally-coherent-embeddings-for-self)`

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

21 Mar 2020 · Joshua Knights, Ben Harwood, Daniel Ward, Anthony Vanderkop, Olivia Mackenzie-Ross, Peyman Moghadam ·

This paper presents TCE: Temporally Coherent Embeddings for self-supervised video representation learning. The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding space, rather than indirectly learning it through ranking or predictive proxy tasks. In the same way that high-level visual information in the world changes smoothly, we believe that nearby frames in learned representations will benefit from demonstrating similar properties. Using this assumption, we train our TCE model to encode videos such that adjacent frames exist close to each other and videos are separated from one another. Using TCE we learn robust representations from large quantities of unlabeled video data. We thoroughly analyse and evaluate our self-supervised learned TCE models on a downstream task of video action recognition using multiple challenging benchmarks (Kinetics400, UCF101, HMDB51). With a simple but effective 2D-CNN backbone and only RGB stream inputs, TCE pre-trained representations outperform all previous selfsupervised 2D-CNN and 3D-CNN pre-trained on UCF101. The code and pre-trained models for this paper can be downloaded at: https://github.com/csiro-robotics/TCE

PDF Abstract

Code

Add Remove Mark official

csiro-robotics/TCE official

Tasks

Add Remove

Action Recognition

Metric Learning

Representation Learning

Self-Supervised Action Recognition

Self-Supervised Learning

Temporal Action Localization

Datasets

UCF101

Kinetics

HMDB51

Kinetics 400

Results from the Paper

Edit

Ranked #37 on Self-Supervised Action Recognition on UCF101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-50)	Top-1 Accuracy	36.6	# 38	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	HMDB51	TCE (ResNet-18)	Top-1 Accuracy	34.2	# 41	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	TCE (ResNet-50)	3-fold Accuracy	71.2	# 37	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	TCE (ResNet18, Split 1)	3-fold Accuracy	68.2	# 39	Compare
			Pre-Training Dataset	UCF101	# 1	Compare
			Frozen	false	# 1	Compare
Self-Supervised Action Recognition	UCF101	TCE (ResNet-18, Split 1)	3-fold Accuracy	68.8	# 38	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare

Methods

Add Remove

1x1 Convolution • Average Pooling • Batch Normalization • Bottleneck Residual Block • Convolution • Global Average Pooling • Kaiming Initialization • Max Pooling • ReLU • Residual Block • Residual Connection • ResNet

Edit Social Preview

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove