Few-shot learning aims to recognize novel classes from a few examples.
With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations.
To the best of our knowledge, XDC is the first self-supervised learning method that outperforms large-scale fully-supervised pretraining for action recognition on the same architecture.
Ranked #1 on Self-Supervised Action Recognition on UCF101
We demonstrate that the computational cost of action recognition on untrimmed videos can be dramatically reduced by invoking recognition only on these most salient clips.
Ranked #1 on Action Recognition on miniSports
There is a natural correlation between the visual and auditive elements of a video.
Ranked #5 on Self-Supervised Audio Classification on ESC-50
In this work, we built an automatic image-understanding method that can accurately classify different types of colorectal polyps in whole-slide histology images to help pathologists with histopathological characterization and diagnosis of colorectal polyps.