Search Results for author: Gabriela Ben Melech Stan

Found 5 papers, 2 papers with code

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

no code implementations • 3 Apr 2024 • Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal

In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.

Language Modelling

Paper
Add Code

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Paper
Code

LDM3D-VR: Latent Diffusion Model for 3D VR

no code implementations • 6 Nov 2023 • Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions.

Paper
Add Code

LDM3D: Latent Diffusion Model for 3D

2 code implementations • 18 May 2023 • Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.

4,098

Paper
Code

MuMUR : Multilingual Multimodal Universal Retrieval

no code implementations • 24 Aug 2022 • Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.

Image Retrieval Machine Translation +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.