Self-Supervised Action Recognition

34 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Self-Supervised Action Recognition

Dataset	Best Model	Compare
UCF101	VideoMAE V2-g	See all
HMDB51	MVD (ViT-B)	See all
UCF101 (finetuned)	CVRL (R3D-152 2x; K600)	See all
HMDB51 (finetuned)	BraVe:V-FA (TSM-50x2)	See all
Kinetics-600	CVRL (R3D-101)	See all
Kinetics-400	CVRL (R3D-101)	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Video Representation Learning by Dense Predictive Coding

TengdaHan/DPC • • 10 Sep 2019

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.

Paper
Code

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

HumamAlwassel/XDC • • NeurIPS 2020

To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.

Paper
Code

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

BestJuly/VCP • • 2 Jan 2020

As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning.

Paper
Code

Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video

hyeon-jo/PSPNet • • 5 Mar 2020

We propose a self-supervised visual learning method by predicting the variable playback speeds of a video.

Paper
Code

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

csiro-robotics/TCE • • 21 Mar 2020

The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding space, rather than indirectly learning it through ranking or predictive proxy tasks.

Paper
Code

SpeedNet: Learning the Speediness in Videos

yasar-rehman/fedvssl • • CVPR 2020

We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval.

Paper
Code

Audio-Visual Instance Discrimination with Cross-Modal Agreement

facebookresearch/AVID-CMA • • CVPR 2021

Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa.

Paper
Code

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

yuanyao366/PRP • • CVPR 2020

The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism.

Paper
Code

Self-Supervised MultiModal Versatile Networks

deepmind/deepmind-research • • NeurIPS 2020

In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.

Paper
Code

Self-supervised Co-training for Video Representation Learning

TengdaHan/CoCLR • • NeurIPS 2020

The objective of this paper is visual-only self-supervised video representation learning.

Paper
Code

Self-Supervised Action Recognition

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result