Self-Supervised Action Recognition

35 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Contrastive Multiview Coding

HobbitLong/CMC ECCV 2020

We analyze key properties of the approach that make it work, finding that the contrastive loss outperforms a popular alternative based on cross-view prediction, and that the more views we learn from, the better the resulting representation captures underlying scene semantics.

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

MCG-NJU/VideoMAE 23 Mar 2022

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

Spatiotemporal Contrastive Video Representation Learning

tensorflow/models CVPR 2021

Our representations are learned using a contrastive loss, where two augmented clips from the same short video are pulled together in the embedding space, while clips from different videos are pushed away.

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework

BestJuly/Inter-intra-video-contrastive-learning 6 Aug 2020

With the proposed Inter-Intra Contrastive (IIC) framework, we can train spatio-temporal convolutional networks to learn video representations.

A Large-Scale Study on Unsupervised Spatiotemporal Representation Learning

facebookresearch/SlowFast CVPR 2021

We present a large-scale study on unsupervised spatiotemporal representation learning from videos.

Masked Motion Encoding for Self-Supervised Video Representation Learning

xinyusun/mme CVPR 2023

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

sunilhoho/everest 19 Nov 2022

Masked Video Autoencoder (MVA) approaches have demonstrated their potential by significantly outperforming previous video representation learning methods.

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

juliendenize/eztorch 21 Dec 2022

A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise.

Cross-Model Cross-Stream Learning for Self-Supervised Human Action Recognition

levigty/acl 15 Jul 2023

Inspired by SkeletonBYOL, this paper further presents a Cross-Model and Cross-Stream (CMCS) framework.