Search Results for author: Gabriela Ben Melech Stan

Found 5 papers, 2 papers with code

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation1 Apr 2024 Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

LDM3D-VR: Latent Diffusion Model for 3D VR

no code implementations6 Nov 2023 Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions.

LDM3D: Latent Diffusion Model for 3D

2 code implementations18 May 2023 Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.

MuMUR : Multilingual Multimodal Universal Retrieval

no code implementations24 Aug 2022 Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.

Image Retrieval Machine Translation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.