Seeing voices and hearing voices: learning discriminative embeddings using cross-modal self-supervision

The goal of this work is to train discriminative cross-modal embeddings without access to manually annotated data. Recent advances in self-supervised learning have shown that effective representations can be learnt from natural cross-modal synchrony... (read more)

Results in Papers With Code
(↓ scroll down to see all results)