Self-Supervised Action Recognition

34 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Self-Supervised Action Recognition

Dataset	Best Model	Compare
UCF101	VideoMAE V2-g	See all
HMDB51	MVD (ViT-B)	See all
UCF101 (finetuned)	CVRL (R3D-152 2x; K600)	See all
HMDB51 (finetuned)	BraVe:V-FA (TSM-50x2)	See all
Kinetics-600	CVRL (R3D-101)	See all
Kinetics-400	CVRL (R3D-101)	See all

Datasets

Most implemented papers

Most implemented Social Latest No code

Contrastive Multiview Coding

HobbitLong/CMC • • ECCV 2020

We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.

Paper
Code

Spatiotemporal Contrastive Video Representation Learning

tensorflow/models • • CVPR 2021

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Paper
Code

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

MCG-NJU/VideoMAE • • 23 Mar 2022

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

Paper
Code

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd • • CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Paper
Code

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

BestJuly/Inter-intra-video-contrastive-learning • • 6 Aug 2020

With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.

Paper
Code

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

facebookresearch/SlowFast • • CVPR 2021

We present a large-scale study on unsupervised spatiotemporal representation learning from videos.

Paper
Code

Masked Motion Encoding for Self-Supervised Video Representation Learning

xinyusun/mme • • CVPR 2023

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

Paper
Code

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

juliendenize/eztorch • • 21 Dec 2022

A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise.

Paper
Code