TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Action Recognition	HMDB51	TCLR (R3D-18)	Top-1 Accuracy	52.9	# 31
Self-Supervised Action Recognition	HMDB51	TCLR (R3D-18)	Pre-Training Dataset	UCF101	# 1
Self-Supervised Action Recognition	HMDB51	TCLR (R3D-18)	Frozen	false	# 1
Self-supervised Video Retrieval	HMDB51	TCLR (R3D-18)	Top-1	22.8	# 9
Self-supervised Video Retrieval	HMDB51	TCLR (R3D-18)	Pretrain	UCF101	# 1
Self-supervised Video Retrieval	UCF101	TCLR (R3D-18)	Top-1	56.2	# 9
Self-supervised Video Retrieval	UCF101	TCLR (R3D-18)	Pretrain	UCF101	# 1
Self-Supervised Action Recognition	UCF101	TCLR (R3D-18)	3-fold Accuracy	82.4	# 31
Self-Supervised Action Recognition	UCF101	TCLR (R3D-18)	Pre-Training Dataset	UCF101	# 1
Self-Supervised Action Recognition	UCF101	TCLR (R3D-18)	Frozen	false	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tclr-temporal-contrastive-learning-for-video/self-supervised-video-retrieval-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-video-retrieval-on-hmdb51?p=tclr-temporal-contrastive-learning-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tclr-temporal-contrastive-learning-for-video/self-supervised-video-retrieval-on-ucf101)](https://paperswithcode.com/sota/self-supervised-video-retrieval-on-ucf101?p=tclr-temporal-contrastive-learning-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tclr-temporal-contrastive-learning-for-video/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=tclr-temporal-contrastive-learning-for-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tclr-temporal-contrastive-learning-for-video/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=tclr-temporal-contrastive-learning-for-video)`

TCLR: Temporal Contrastive Learning for Video Representation

20 Jan 2021 · Ishan Dave, Rohit Gupta, Mamshad Nayeem Rizve, Mubarak Shah ·

Contrastive learning has nearly closed the gap between supervised and self-supervised learning of image representations, and has also been explored for videos. However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension. We develop a new temporal contrastive learning framework consisting of two novel losses to improve upon existing contrastive self-supervised video representation learning methods. The local-local temporal contrastive loss adds the task of discriminating between non-overlapping clips from the same video, whereas the global-local temporal contrastive aims to discriminate between timesteps of the feature map of an input clip in order to increase the temporal diversity of the learned features. Our proposed temporal contrastive learning framework achieves significant improvement over the state-of-the-art results in various downstream video understanding tasks such as action recognition, limited-label action classification, and nearest-neighbor video retrieval on multiple video datasets and backbones. We also demonstrate significant improvement in fine-grained action classification for visually similar classes. With the commonly used 3D ResNet-18 architecture with UCF101 pretraining, we achieve 82.4\% (+5.1\% increase over the previous best) top-1 accuracy on UCF101 and 52.9\% (+5.4\% increase) on HMDB51 action classification, and 56.2\% (+11.7\% increase) Top-1 Recall on UCF101 nearest neighbor video retrieval. Code released at github.com/DAVEISHAN/TCLR.

PDF Abstract

Code

Add Remove Mark official

DAVEISHAN/TCLR official

Tasks

Add Remove

Action Classification

Action Recognition

Contrastive Learning

General Classification

Representation Learning

Retrieval

Self-Supervised Action Recognition

Self-Supervised Learning

Self-supervised Video Retrieval

Video Understanding

Datasets

UCF101

HMDB51

Results from the Paper

Edit

Ranked #9 on Self-supervised Video Retrieval on UCF101

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Action Recognition	HMDB51	TCLR (R3D-18)	Top-1 Accuracy	52.9	# 31	Compare
			Pre-Training Dataset	UCF101	# 1	Compare
			Frozen	false	# 1	Compare
Self-supervised Video Retrieval	HMDB51	TCLR (R3D-18)	Top-1	22.8	# 9	Compare
Self-supervised Video Retrieval	HMDB51	TCLR (R3D-18)	Pretrain	UCF101	# 1	Compare
Self-supervised Video Retrieval	UCF101	TCLR (R3D-18)	Top-1	56.2	# 9	Compare
Self-supervised Video Retrieval	UCF101	TCLR (R3D-18)	Pretrain	UCF101	# 1	Compare
Self-Supervised Action Recognition	UCF101	TCLR (R3D-18)	3-fold Accuracy	82.4	# 31	Compare
			Pre-Training Dataset	UCF101	# 1	Compare
			Frozen	false	# 1	Compare

Methods

Add Remove

Contrastive Learning

Edit Social Preview

TCLR: Temporal Contrastive Learning for Video Representation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove