no code implementations • 7 Oct 2022 • Noam Malali, Yosi Keller
We present a deep learning approach for learning the joint semantic embeddings of images and captions in a Euclidean space, such that the semantic similarity is approximated by the L2 distances in the embedding space.