Data augmentation is an essential technique for improving recognition accuracy in object recognition using deep learning.
Recent works in self-supervised learning have demonstrated strong performance on scene-level dense prediction tasks by pretraining with object-centric or region-based correspondence objectives.
They are highly and widely used in tasks such as segmentation.
In inference, it is possible to obtain instance segmentation results only from sound images.
In this paper, we propose a novel self-supervised learning framework that combines contrastive learning with neural processes.
Our approach results in a computationally and memory-efficient model: CFLOW-AD is faster and smaller by a factor of 10x than prior state-of-the-art with the same input setting.
Ranked #19 on Anomaly Detection on MVTec AD (using extra training data)
However, there remains a lack of studies that extend action composition and leverage multiple viewpoints and multiple modalities of data for representation learning.
Ranked #1 on Video Classification on Home Action Genome
To overcome these limitations, we reformulate AutoAugment as a generalized automated dataset optimization (AutoDO) task that minimizes the distribution shift between test data and distorted train dataset.