ESCo: Towards Provably Effective and Scalable Contrastive Representation Learning

29 Sep 2021  ·  Hengrui Zhang, Qitian Wu, Shaofeng Zhang, Junchi Yan, David Wipf, Philip S. Yu ·

InfoNCE-based contrastive learning models (e.g., MoCo, SimCLR, etc.) have shown inspiring power in unsupervised representation learning by maximizing a tight lower bound of the mutual information of two views' representations. However, its quadratic complexity makes it hard for scaling to larger batch sizes, and some recent research suggests that it may exploit superfluous information that is useless for downstream prediction tasks. In this paper, we propose ESCo (Effective and Scalable Contrastive), a new contrastive framework which is essentially an instantiation of the Information Bottleneck principle under self-supervised learning settings. Specifically, ESCo targets a new objective that seeks to maximize the similarity between the representations of positive pairs and minimize the pair-wise kernel potential of negative pairs, with a provable guarantee of effective representations that preserve task-relevant information and discard the irrelevant one. Furthermore, to escape from the quadratic time complexity and memory cost, we propose to leverage the Random Features to achieve accurate approximation with linear scalability. We show that the vanilla InfoNCE objective is a degenerated case of ESCo, which implies that ESCo can potentially boost existing InfoNCE-based models. To verify our method, we conduct extensive experiments on both synthetic and real-world datasets, showing its superior performance over the InfoNCE-based baselines in (unsupervised) representation learning tasks for images and graphs.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods