Self-Supervised Action Recognition

34 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Benchmarks

Add a Result

These leaderboards are used to track progress in Self-Supervised Action Recognition

Dataset	Best Model	Compare
UCF101	VideoMAE V2-g	See all
HMDB51	MVD (ViT-B)	See all
UCF101 (finetuned)	CVRL (R3D-152 2x; K600)	See all
HMDB51 (finetuned)	BraVe:V-FA (TSM-50x2)	See all
Kinetics-600	CVRL (R3D-101)	See all
Kinetics-400	CVRL (R3D-101)	See all

Datasets

Latest papers

Most implemented Social Latest No code

Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition

levigty/acl • • 15 Jul 2023

Considering the instance-level discriminative ability, contrastive learning methods, including MoCo and SimCLR, have been adapted from the original image representation learning task to solve the self-supervised skeleton-based action recognition task.

15 Jul 2023

Paper
Code

Part Aware Contrastive Learning for Self-Supervised Action Recognition

githubofhyl97/skeattnclr • • 1 May 2023

This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations.

01 May 2023

Paper
Code

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 • • CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

392

29 Mar 2023

Paper
Code

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

juliendenize/eztorch • • 21 Dec 2022

A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise.

21 Dec 2022

Paper
Code

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd • • CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

08 Dec 2022

Paper
Code

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

pritamqu/XKD • • 25 Nov 2022

First, masked data reconstruction is performed to learn modality-specific representations from audio and visual streams.

25 Nov 2022

Paper
Code

EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

sunilhoho/VideoMS • • 19 Nov 2022

Masked Video Autoencoder (MVA) approaches have demonstrated their potential by significantly outperforming previous video representation learning methods.

19 Nov 2022

Paper
Code

Masked Motion Encoding for Self-Supervised Video Representation Learning

XinyuSun/M3Video • • CVPR 2023

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

12 Oct 2022

Paper
Code

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

rvl-lab-utoronto/video_similarity_search • • CVPR 2022

One of the key reasons for this is that sampling pairs of similar video clips, a required step for many self-supervised contrastive learning methods, is currently done conservatively to avoid false positives.

25 Jun 2022

Paper
Code

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

huggingface/transformers • • 23 Mar 2022

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

124,527

23 Mar 2022

Paper
Code

Self-Supervised Action Recognition

Benchmarks Add a Result

Datasets

Latest papers

Content

Benchmarks

Add a Result