no code implementations • RepL4NLP (ACL) 2022 • Romain Bielawski, Benjamin Devillers, Tim Van De Cruys, Rufin VanRullen
We compare CLIP’s visual stream against two visually trained networks and CLIP’s textual stream against two linguistically trained networks, as well as multimodal combinations of these networks.
1 code implementation • CoNLL (EMNLP) 2021 • Benjamin Devillers, Bhavin Choksi, Romain Bielawski, Rufin VanRullen
Vision models trained on multimodal datasets can benefit from the wide availability of large image-caption datasets.