no code implementations • 10 Aug 2021 • Masoud Monajatipoor, Mozhdeh Rouhsedaghat, Liunian Harold Li, Aichi Chien, C. -C. Jay Kuo, Fabien Scalzo, Kai-Wei Chang
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them.