Self-supervised Video Retrieval

9 papers with code • 2 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

BestJuly/Inter-intra-video-contrastive-learning 6 Aug 2020

With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

juliendenize/eztorch 21 Dec 2022

A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise.

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

BestJuly/VCP 2 Jan 2020

As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning.

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

yuanyao366/PRP CVPR 2020

The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism.

Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning

BestJuly/Pretext-Contrastive-Learning 29 Oct 2020

It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning.

TCLR: Temporal Contrastive Learning for Video Representation


However, prior work on contrastive learning for video data has not explored the effect of explicitly encouraging the features to be distinct across the temporal dimension.

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting

martinetoering/ViCC 18 Jun 2021

Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning.

Self-Supervised Audio-Visual Representation Learning with Relaxed Cross-Modal Synchronicity

pritamqu/CrissCross 9 Nov 2021

We present CrissCross, a self-supervised framework for learning audio-visual representations.

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

rvl-lab-utoronto/video_similarity_search CVPR 2022

One of the key reasons for this is that sampling pairs of similar video clips, a required step for many self-supervised contrastive learning methods, is currently done conservatively to avoid false positives.