Self-Supervised Action Recognition

33 papers with code • 6 benchmarks • 4 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Contrastive Multiview Coding

HobbitLong/CMC ECCV 2020

We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.

Spatiotemporal Contrastive Video Representation Learning

tensorflow/models CVPR 2021

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

MCG-NJU/VideoMAE 23 Mar 2022

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

BestJuly/Inter-intra-video-contrastive-learning 6 Aug 2020

With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

facebookresearch/SlowFast CVPR 2021

We present a large-scale study on unsupervised spatiotemporal representation learning from videos.

Masked Motion Encoding for Self-Supervised Video Representation Learning

xinyusun/mme CVPR 2023

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

juliendenize/eztorch 21 Dec 2022

A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise.

Unsupervised Representation Learning by Sorting Sequences

HsinYingLee/OPN ICCV 2017

We present an unsupervised representation learning approach using videos without semantic labels.