Learning Representations by Maximizing Mutual Information Across Views

NeurIPS 2019 Philip BachmanR Devon HjelmWilliam Buchwalter

We propose an approach to self-supervised representation learning based on maximizing mutual information between features extracted from multiple views of a shared context. For example, one could produce multiple views of a local spatio-temporal context by observing it from different locations (e.g., camera positions within a scene), and via different modalities (e.g., tactile, auditory, or visual)... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Self-Supervised Image Classification ImageNet AMDIM (large) Top 1 Accuracy 68.1% # 8
Self-Supervised Image Classification ImageNet AMDIM (large) Number of Params 626M # 1
Self-Supervised Image Classification ImageNet AMDIM (small) Top 1 Accuracy 63.5% # 14