Methods > General > Self-Supervised Learning

Bootstrap Your Own Latent

Introduced by Grill et al. in Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

BYOL (Bootstrap Your Own Latent) is a new approach to self-supervised learning. BYOL’s goal is to learn a representation $y_θ$ which can then be used for downstream tasks. BYOL uses two neural networks to learn: the online and target networks. The online network is defined by a set of weights $θ$ and is comprised of three stages: an encoder $f_θ$, a projector $g_θ$ and a predictor $q_θ$. The target network has the same architecture as the online network, but uses a different set of weights $ξ$. The target network provides the regression targets to train the online network, and its parameters $ξ$ are an exponential moving average of the online parameters $θ$.

Given the architecture diagram on the right, BYOL minimizes a similarity loss between $q_θ(z_θ)$ and $sg(z'{_ξ})$, where $θ$ are the trained weights, $ξ$ are an exponential moving average of $θ$ and $sg$ means stop-gradient. At the end of training, everything but $f_θ$ is discarded, and $y_θ$ is used as the image representation.

Source: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Image credit: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Source: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning

Latest Papers

Hyperspherically Regularized Networks for BYOL Improves Feature Uniformity and Separability
Aiden DurrantGeorgios Leontidis
Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples
| Mahmoud AssranMathilde CaronIshan MisraPiotr BojanowskiArmand JoulinNicolas BallasMichael Rabbat
Leveraging background augmentations to encourage semantic focus in self-supervised contrastive learning
Chaitanya K. RyaliDavid J. SchwabAri S. Morcos
Self-supervised representation learning from 12-lead ECG data
| Temesgen MehariNils Strodthoff
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
| Daisuke NiizumiDaiki TakeuchiYasunori OhishiNoboru HaradaKunio Kashino
Shift Equivariance for Pixel-based Self-supervised SAR-optical Feature Fusion
Yuxing ChenLorenzo Bruzzone
Self-supervised Pretraining of Visual Features in the Wild
| Priya GoyalMathilde CaronBenjamin LefaudeuxMin XuPengchao WangVivek PaiMannat SinghVitaliy LiptchinskyIshan MisraArmand JoulinPiotr Bojanowski
Bootstrapped Representation Learning on Graphs
Shantanu ThakoorCorentin TallecMohammad Gheshlaghi AzarRémi MunosPetar VeličkovićMichal Valko
Understanding self-supervised Learning Dynamics without Contrastive Pairs
| Yuandong TianXinlei ChenSurya Ganguli
Self-Supervised Representation Learning from Flow Equivariance
Yuwen XiongMengye RenWenyuan ZengRaquel Urtasun
Self-supervised Adversarial Robustness for the Low-label, High-data Regime
Run Away From your Teacher: a New Self-Supervised Approach Solving the Puzzle of BYOL
ISD: Self-Supervised Learning by Iterative Similarity Distillation
| Ajinkya TejankarSoroush Abbasi KoohpayeganiVipin PillaiPaolo FavaroHamed Pirsiavash
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning
| Jean-bastien GrillFlorian StrubFlorent AltchéCorentin TallecPierre RichemondElena BuchatskayaCarl DoerschBernardo Avila PiresZhaohan GuoMohammad Gheshlaghi AzarBilal PiotKoray KavukcuogluRemi MunosMichal Valko
How Well Do Self-Supervised Models Transfer?
| Linus EricssonHenry GoukTimothy M. Hospedales


🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign