Self-Supervised Action Recognition

34 papers with code • 6 benchmarks • 5 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Video Representation Learning by Dense Predictive Coding

TengdaHan/DPC 10 Sep 2019

The objective of this paper is self-supervised learning of spatio-temporal embeddings from video, suitable for human action recognition.

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

HumamAlwassel/XDC NeurIPS 2020

To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.

Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

BestJuly/VCP 2 Jan 2020

As a proxy task, it converts rich self-supervised representations into video clip operations (options), which enhances the flexibility and reduces the complexity of representation learning.

Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video

hyeon-jo/PSPNet 5 Mar 2020

We propose a self-supervised visual learning method by predicting the variable playback speeds of a video.

Temporally Coherent Embeddings for Self-Supervised Video Representation Learning

csiro-robotics/TCE 21 Mar 2020

The proposed method exploits inherent structure of unlabeled video data to explicitly enforce temporal coherency in the embedding space, rather than indirectly learning it through ranking or predictive proxy tasks.

SpeedNet: Learning the Speediness in Videos

yasar-rehman/fedvssl CVPR 2020

We demonstrate how those learned features can boost the performance of self-supervised action recognition, and can be used for video retrieval.

Audio-Visual Instance Discrimination with Cross-Modal Agreement

facebookresearch/AVID-CMA CVPR 2021

Our method uses contrastive learning for cross-modal discrimination of video from audio and vice-versa.

Video Playback Rate Perception for Self-Supervised Spatio-Temporal Representation Learning

yuanyao366/PRP CVPR 2020

The generative perception model acts as a feature decoder to focus on comprehending high temporal resolution and short-term representation by introducing a motion-attention mechanism.

Self-Supervised MultiModal Versatile Networks

deepmind/deepmind-research NeurIPS 2020

In particular, we explore how best to combine the modalities, such that fine-grained representations of the visual and audio modalities can be maintained, whilst also integrating text into a common embedding.

Self-supervised Co-training for Video Representation Learning

TengdaHan/CoCLR NeurIPS 2020

The objective of this paper is visual-only self-supervised video representation learning.