Emerging Properties in Self-Supervised Vision Transformers

lucidrains/vit-pytorch 29 Apr 2021

In this paper, we question if self-supervised learning provides new properties to Vision Transformer (ViT) that stand out compared to convolutional networks (convnets).

Video Re-localization

fengyang0317/video_reloc ECCV 2018

We first exploit and reorganize the videos in ActivityNet to form a new dataset for video re-localization research, which consists of about 10, 000 videos of diverse visual appearances associated with localized boundary information.

