no code implementations • 2 Mar 2024 • Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes
Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild.
no code implementations • 14 Feb 2024 • Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar Averbuch-Elor
To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information.
1 code implementation • 6 Dec 2023 • Assaf Ben-Kish, Moran Yanuka, Morris Alper, Raja Giryes, Hadar Averbuch-Elor
To this end, we propose a framework for addressing hallucinations in image captioning in the open-vocabulary setting.
no code implementations • NeurIPS 2023 • Morris Alper, Hadar Averbuch-Elor
Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism.
no code implementations • ICCV 2023 • Morris Alper, Hadar Averbuch-Elor
We show that the pseudo-labels produced by this procedure can be used to train a captioning model to effectively understand human-human interactions in images, as measured by a variety of metrics that measure textual and semantic faithfulness and factual groundedness of our predictions.
1 code implementation • CVPR 2023 • Morris Alper, Michael Fiman, Hadar Averbuch-Elor
We show that SOTA multimodally trained text encoders outperform unimodally trained text encoders on the VLU tasks while being underperformed by them on the NLU tasks, lending new context to previously mixed results regarding the NLU capabilities of multimodal models.