Self-supervised Temporal Learning

1 Jan 2021  ·  Hao Shao, Yu Liu, Hongsheng Li ·

Self-supervised learning (SSL) has shown its powerful ability in discriminative representations for various visual, audio, and video applications. However, most recent works still focus on the different paradigms of spatial-level SSL on video representations. How to self-supervised learn the inherent representation on the temporal dimension is still unrevealed. In this work we propose self-supervised temporal learning (SSTL), aiming at learning spatial-temporal-invariance. Inspired by spatial-based contrastive SSL, we show that significant improvement can be achieved by a proposed temporal-based contrastive learning approach, which includes three novel and efficient modules: temporal augmentations, temporal memory bank and SSTL loss. The temporal augmentations include three operators -- temporal crop, temporal dropout, and temporal jitter. Besides the contrastive paradigm, we observe the temporal contents vary between each layer of the temporal pyramid. The SSTL extends the upper-bound of the current SSL approaches by $\sim$6% on the famous video classification tasks and surprisingly improves the current state-of-the-art approaches by $\sim$100% on some famous video retrieval tasks. The code of SSTL is released with this draft, hoping to nourish the progress of the booming self-supervised learning community.

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods