TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Self-Supervised Action Recognition	HMDB51	SCE (R3D-50)	Top-1 Accuracy	74.7	# 4
Self-Supervised Action Recognition	HMDB51	SCE (R3D-50)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	HMDB51	SCE (R3D-50)	Frozen	false	# 1
Self-supervised Video Retrieval	HMDB51	SCE (R3D-50)	Top-1	45.9	# 1
Self-supervised Video Retrieval	HMDB51	SCE (R3D-50)	Pretrain	Kinetics400	# 1
Self-supervised Video Retrieval	HMDB51	SCE (R3D-18)	Top-1	40.1	# 2
Self-supervised Video Retrieval	HMDB51	SCE (R3D-18)	Pretrain	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	SCE (R3D-50)	3-fold Accuracy	95.3	# 7
Self-Supervised Action Recognition	UCF101	SCE (R3D-50)	Pre-Training Dataset	Kinetics400	# 1
Self-Supervised Action Recognition	UCF101	SCE (R3D-50)	Frozen	false	# 1
Self-supervised Video Retrieval	UCF101	SCE (R3D-18)	Top-1	74.5	# 2
Self-supervised Video Retrieval	UCF101	SCE (R3D-18)	Pretrain	Kinetics400	# 1
Self-supervised Video Retrieval	UCF101	SCE (R3D-50)	Top-1	83.9	# 1
Self-supervised Video Retrieval	UCF101	SCE (R3D-50)	Pretrain	Kinetics400	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-contrastive-estimation-for-image/self-supervised-video-retrieval-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-video-retrieval-on-hmdb51?p=similarity-contrastive-estimation-for-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-contrastive-estimation-for-image/self-supervised-video-retrieval-on-ucf101)](https://paperswithcode.com/sota/self-supervised-video-retrieval-on-ucf101?p=similarity-contrastive-estimation-for-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-contrastive-estimation-for-image/self-supervised-action-recognition-on-hmdb51)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-hmdb51?p=similarity-contrastive-estimation-for-image)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/similarity-contrastive-estimation-for-image/self-supervised-action-recognition-on-ucf101)](https://paperswithcode.com/sota/self-supervised-action-recognition-on-ucf101?p=similarity-contrastive-estimation-for-image)`

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

21 Dec 2022 · Julien Denize, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault ·

Contrastive representation learning has proven to be an effective self-supervised learning method for images and videos. Most successful approaches are based on Noise Contrastive Estimation (NCE) and use different views of an instance as positives that should be contrasted with other instances, called negatives, that are considered as noise. However, several instances in a dataset are drawn from the same distribution and share underlying semantic information. A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise. To circumvent this issue, we propose a novel formulation of contrastive learning using semantic similarity between instances called Similarity Contrastive Estimation (SCE). Our training objective is a soft contrastive one that brings the positives closer and estimates a continuous distribution to push or pull negative instances based on their learned similarities. We validate empirically our approach on both image and video representation learning. We show that SCE performs competitively with the state of the art on the ImageNet linear evaluation protocol for fewer pretraining epochs and that it generalizes to several downstream image tasks. We also show that SCE reaches state-of-the-art results for pretraining video representation and that the learned representation can generalize to video downstream tasks.

PDF Abstract

Code

Add Remove Mark official

juliendenize/eztorch official

cea-list/sce

Tasks

Add Remove

Contrastive Learning

Linear evaluation

Representation Learning

Self-Supervised Action Recognition

Self-Supervised Learning

Self-supervised Video Retrieval

Semantic Similarity

Semantic Textual Similarity

Datasets

CIFAR-10

ImageNet

MS COCO

CIFAR-100

UCF101

Kinetics

Oxford 102 Flower

STL-10

HMDB51

Stanford Cars

DTD

Kinetics 400

Food-101

Caltech-101

FGVC-Aircraft

Something-Something V2

PASCAL VOC 2007

AVA

VOC 2012

Results from the Paper

Edit

Ranked #1 on Self-supervised Video Retrieval on HMDB51

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Self-Supervised Action Recognition	HMDB51	SCE (R3D-50)	Top-1 Accuracy	74.7	# 4	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-supervised Video Retrieval	HMDB51	SCE (R3D-50)	Top-1	45.9	# 1	Compare
Self-supervised Video Retrieval	HMDB51	SCE (R3D-50)	Pretrain	Kinetics400	# 1	Compare
Self-supervised Video Retrieval	HMDB51	SCE (R3D-18)	Top-1	40.1	# 2	Compare
Self-supervised Video Retrieval	HMDB51	SCE (R3D-18)	Pretrain	Kinetics400	# 1	Compare
Self-Supervised Action Recognition	UCF101	SCE (R3D-50)	3-fold Accuracy	95.3	# 7	Compare
			Pre-Training Dataset	Kinetics400	# 1	Compare
			Frozen	false	# 1	Compare
Self-supervised Video Retrieval	UCF101	SCE (R3D-18)	Top-1	74.5	# 2	Compare
Self-supervised Video Retrieval	UCF101	SCE (R3D-18)	Pretrain	Kinetics400	# 1	Compare
Self-supervised Video Retrieval	UCF101	SCE (R3D-50)	Top-1	83.9	# 1	Compare
Self-supervised Video Retrieval	UCF101	SCE (R3D-50)	Pretrain	Kinetics400	# 1	Compare

Methods

Add Remove

Contrastive Learning

Edit Social Preview

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove