Self-Supervised Learning by Cross-Modal Audio-Video Clustering

28 Nov 2019Humam AlwasselDhruv MahajanLorenzo TorresaniBernard GhanemDu Tran

The visual and audio modalities are highly correlated yet they contain different information. Their strong correlation makes it possible to predict the semantics of one from the other with good accuracy... (read more)

PDF Abstract

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.