Self-Supervised Learning
# Contrastive Predictive Coding

Introduced by Oord et al. in Representation Learning with Contrastive Predictive Coding
#### Papers

#### Tasks

#### Usage Over Time

####
Categories

**Contrastive Predictive Coding (CPC)** learns self-supervised representations by predicting the future in latent space by using powerful autoregressive models. The model uses a probabilistic contrastive loss which induces the latent space to capture information that is maximally useful
to predict future samples.

First, a non-linear encoder $g_{enc}$ maps the input sequence of observations $x_{t}$ to a sequence of latent representations $z_{t} = g_{enc}\left(x_{t}\right)$, potentially with a lower temporal resolution. Next, an autoregressive model $g_{ar}$ summarizes all $z\leq{t}$ in the latent space and produces a context latent representation $c_{t} = g_{ar}\left(z\leq{t}\right)$.

A density ratio is modelled which preserves the mutual information between $x_{t+k}$ and $c_{t}$ as follows:

$$ f_{k}\left(x_{t+k}, c_{t}\right) \propto \frac{p\left(x_{t+k}|c_{t}\right)}{p\left(x_{t+k}\right)} $$

where $\propto$ stands for ’proportional to’ (i.e. up to a multiplicative constant). Note that the density ratio $f$ can be unnormalized (does not have to integrate to 1). The authors use a simple log-bilinear model:

$$ f_{k}\left(x_{t+k}, c_{t}\right) = \exp\left(z^{T}_{t+k}W_{k}c_{t}\right) $$

Any type of autoencoder and autoregressive can be used. An example the authors opt for is strided convolutional layers with residual blocks and GRUs.

The autoencoder and autoregressive models are trained to minimize an InfoNCE loss (see components).

Source: Representation Learning with Contrastive Predictive CodingPaper | Code | Results | Date | Stars |
---|

Task | Papers | Share |
---|---|---|

Self-Supervised Learning | 13 | 8.07% |

Speech Recognition | 10 | 6.21% |

Automatic Speech Recognition (ASR) | 7 | 4.35% |

Acoustic Unit Discovery | 6 | 3.73% |

Language Modelling | 5 | 3.11% |

Anomaly Detection | 5 | 3.11% |

Voice Conversion | 5 | 3.11% |

Quantization | 4 | 2.48% |

General Classification | 4 | 2.48% |