Search Results for author: Morris Alper

Found 6 papers, 2 papers with code

ICC: Quantifying Image Caption Concreteness for Multimodal Dataset Curation

no code implementations • 2 Mar 2024 • Moran Yanuka, Morris Alper, Hadar Averbuch-Elor, Raja Giryes

Web-scale training on paired text-image data is becoming increasingly central to multimodal learning, but is challenged by the highly noisy nature of datasets in the wild.

Sentence

Paper
Add Code

HaLo-NeRF: Learning Geometry-Guided Semantics for Exploring Unconstrained Photo Collections

no code implementations • 14 Feb 2024 • Chen Dudai, Morris Alper, Hana Bezalel, Rana Hanocka, Itai Lang, Hadar Averbuch-Elor

To bolster such models with fine-grained knowledge, we leverage large-scale Internet data containing images of similar landmarks along with weakly-related textual information.

Paper
Add Code

Mitigating Open-Vocabulary Caption Hallucinations

1 code implementation • 6 Dec 2023 • Assaf Ben-Kish, Moran Yanuka, Morris Alper, Raja Giryes, Hadar Averbuch-Elor

To this end, we propose a framework for addressing hallucinations in image captioning in the open-vocabulary setting.

Hallucination Image Captioning +3

Paper
Code

Kiki or Bouba? Sound Symbolism in Vision-and-Language Models

no code implementations • NeurIPS 2023 • Morris Alper, Hadar Averbuch-Elor

Although the mapping between sound and meaning in human language is assumed to be largely arbitrary, research in cognitive science has shown that there are non-trivial correlations between particular sounds and meanings across languages and demographic groups, a phenomenon known as sound symbolism.

Knowledge Probing

Paper
Add Code

Learning Human-Human Interactions in Images from Weak Textual Supervision

no code implementations • ICCV 2023 • Morris Alper, Hadar Averbuch-Elor

We show that the pseudo-labels produced by this procedure can be used to train a captioning model to effectively understand human-human interactions in images, as measured by a variety of metrics that measure textual and semantic faithfulness and factual groundedness of our predictions.

Image Captioning Knowledge Distillation +2

Paper
Add Code

Is BERT Blind? Exploring the Effect of Vision-and-Language Pretraining on Visual Language Understanding

1 code implementation • CVPR 2023 • Morris Alper, Michael Fiman, Hadar Averbuch-Elor

We show that SOTA multimodally trained text encoders outperform unimodally trained text encoders on the VLU tasks while being underperformed by them on the NLU tasks, lending new context to previously mixed results regarding the NLU capabilities of multimodal models.

Knowledge Probing Language Modelling +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.