Contrastive Multiview Coding

13 Jun 2019Yonglong TianDilip KrishnanPhillip Isola

Humans view the world through many sensory channels, e.g., the long-wavelength light channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right ear. Each view is noisy and incomplete, but important factors, such as physics, geometry, and semantics, tend to be shared between all views (e.g., a "dog" can be seen, heard, and felt)... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Self-Supervised Image Classification ImageNet CMC (ResNet-50) Top 1 Accuracy 64.1% # 11
Self-Supervised Image Classification ImageNet CMC (ResNet-50) Top 5 Accuracy 85.4% # 7
Self-Supervised Image Classification ImageNet CMC (ResNet-50) Number of Params 24M # 1
Self-Supervised Image Classification ImageNet CMC (ResNet-101) Top 1 Accuracy 65.0% # 10
Self-Supervised Image Classification ImageNet CMC (ResNet-101) Top 5 Accuracy 86.0% # 6
Self-Supervised Image Classification ImageNet CMC (ResNet-50 x2) Top 1 Accuracy 68.4% # 7
Self-Supervised Image Classification ImageNet CMC (ResNet-50 x2) Top 5 Accuracy 88.2% # 5
Self-Supervised Image Classification ImageNet CMC (ResNet-50 x2) Number of Params 94M # 1