Representation Learning with Contrastive Predictive Coding

10 Jul 2018  ·  Aaron van den Oord, Yazhe Li, Oriol Vinyals ·

While supervised learning has enabled great progress in many applications, unsupervised learning has not seen such widespread adoption, and remains an important and challenging endeavor for artificial intelligence. In this work, we propose a universal unsupervised learning approach to extract useful representations from high-dimensional data, which we call Contrastive Predictive Coding. The key insight of our model is to learn such representations by predicting the future in latent space by using powerful autoregressive models. We use a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful to predict future samples. It also makes the model tractable by using negative sampling. While most prior work has focused on evaluating representations for a particular modality, we demonstrate that our approach is able to learn useful representations achieving strong performance on four distinct domains: speech, images, text and reinforcement learning in 3D environments.

PDF Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Self-Supervised Image Classification ImageNet CPC (ResNet-101 V2) Top 1 Accuracy 48.7% # 112
Top 5 Accuracy 73.6% # 37
Semi-Supervised Image Classification ImageNet - 10% labeled data CPC Top 5 Accuracy 84.88% # 29
Semi-Supervised Image Classification ImageNet - 1% labeled data CPC Top 5 Accuracy 64.03% # 28