Deep Latent Space Learning for Cross-modal Mapping of Audio and Visual Signals

18 Sep 2019Shah NawazMuhammad Kamran JanjuaIgnazio GalloArif MahmoodAlessandro Calefati

We propose a novel deep training algorithm for joint representation of audio and visual information which consists of a single stream network (SSNet) coupled with a novel loss function to learn a shared deep latent space representation of multimodal information. The proposed framework characterizes the shared latent space by leveraging the class centers which helps to eliminate the need for pairwise or triplet supervision... (read more)

PDF Abstract


No code implementations yet. Submit your code now

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.