Search Results for author: Morris Alper

Found 6 papers, 2 papers with code

ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

no code implementations2 Mar 2024 Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes

Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild.

Sentence

HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

no code implementations14 Feb 2024 Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar Averbuch-Elor

To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information.

Mitigating Open-Vocabulary Caption Hallucinations

1 code implementation6 Dec 2023 Assaf Ben-Kish, Moran Yanuka, Morris Alper, Raja Giryes, Hadar Averbuch-Elor

To this end, we propose a framework for addressing hallucinations in image captioning in the open-vocabulary setting.

Hallucination Image Captioning +3

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

no code implementations NeurIPS 2023 Morris Alper, Hadar Averbuch-Elor

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism.

Knowledge Probing

Learning Human-Human Interactions in Images from Weak Textual Supervision

no code implementations ICCV 2023 Morris Alper, Hadar Averbuch-Elor

We show that the pseudo-labels produced by this procedure can be used to train a captioning model to effectively understand human-human interactions in images, as measured by a variety of metrics that measure textual and semantic faithfulness and factual groundedness of our predictions.

Image Captioning Knowledge Distillation +2

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

1 code implementation CVPR 2023 Morris Alper, Michael Fiman, Hadar Averbuch-Elor

We show that SOTA multimodally trained text encoders outperform unimodally trained text encoders on the VLU tasks while being underperformed by them on the NLU tasks, lending new context to previously mixed results regarding the NLU capabilities of multimodal models.

Knowledge Probing Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.