1 code implementation • 29 Jun 2023 • Yasmine Karoui, Rémi Lebret, Negar Foroutan, Karl Aberer
Our evaluation across three distinct tasks (image-text retrieval, visual entailment, and natural language visual reasoning) demonstrates that this approach outperforms the state-of-the-art multilingual vision-language models without requiring large parallel corpora.