Self-Supervised Action Recognition

34 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Joint Adversarial and Collaborative Learning for Self-Supervised Action Recognition

levigty/acl 15 Jul 2023

Considering the instance-level discriminative ability, contrastive learning methods, including MoCo and SimCLR, have been adapted from the original image representation learning task to solve the self-supervised skeleton-based action recognition task.

3
15 Jul 2023

Part Aware Contrastive Learning for Self-Supervised Action Recognition

githubofhyl97/skeattnclr 1 May 2023

This paper proposes an attention-based contrastive learning framework for skeleton representation learning, called SkeAttnCLR, which integrates local similarity and global features for skeleton-based action representations.

13
01 May 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

392
29 Mar 2023

Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning

juliendenize/eztorch 21 Dec 2022

A good data representation should contain relations between the instances, or semantic similarity and dissimilarity, that contrastive learning harms by considering all negatives as noise.

34
21 Dec 2022

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

85
08 Dec 2022

XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning

pritamqu/XKD 25 Nov 2022

First, masked data reconstruction is performed to learn modality-specific representations from audio and visual streams.

6
25 Nov 2022

EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens

sunilhoho/VideoMS 19 Nov 2022

Masked Video Autoencoder (MVA) approaches have demonstrated their potential by significantly outperforming previous video representation learning methods.

17
19 Nov 2022

Masked Motion Encoding for Self-Supervised Video Representation Learning

XinyuSun/M3Video CVPR 2023

The latest attempts seek to learn a representation model by predicting the appearance contents in the masked regions.

40
12 Oct 2022

SLIC: Self-Supervised Learning with Iterative Clustering for Human Action Videos

rvl-lab-utoronto/video_similarity_search CVPR 2022

One of the key reasons for this is that sampling pairs of similar video clips, a required step for many self-supervised contrastive learning methods, is currently done conservatively to avoid false positives.

19
25 Jun 2022

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

huggingface/transformers 23 Mar 2022

Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets.

124,527
23 Mar 2022