no code implementations • WASSA (ACL) 2022 • Ayal Klein, Oren Pereg, Daniel Korat, Vasudev Lal, Moshe Wasserblat, Ido Dagan
In this paper, we investigate and establish empirically a prior conjecture, which suggests that the linguistic relations connecting opinion terms to their aspects transfer well across domains and therefore can be leveraged for cross-domain aspect term extraction.
no code implementations • 3 Apr 2024 • Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal
In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.
1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang
One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.
no code implementations • 29 Mar 2024 • Musashi Hinck, Matthew L. Olson, David Cobbley, Shao-Yen Tseng, Vasudev Lal
We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs).
1 code implementation • 30 Nov 2023 • Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal
Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e. g., a given occupation) while differing only in their depiction of intersectional social attributes (e. g., race & gender).
1 code implementation • 20 Nov 2023 • Shachar Rosenman, Vasudev Lal, Phillip Howard
In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models.
no code implementations • 6 Nov 2023 • Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal
Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions.
1 code implementation • 7 Oct 2023 • Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal
We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP).
no code implementations • 4 Oct 2023 • Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Vasudev Lal
While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race.
1 code implementation • 28 Jun 2023 • Avinash Madasu, Vasudev Lal
The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg.
1 code implementation • 31 May 2023 • Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan
With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.
2 code implementations • 18 May 2023 • Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal
This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.
1 code implementation • 8 May 2023 • Phillip Howard, Junlin Wang, Vasudev Lal, Gadi Singer, Yejin Choi, Swabha Swayamdipta
We introduce NeuroComparatives, a novel framework for comparative knowledge distillation overgenerated from language models such as GPT-variants and LLaMA, followed by stringent filtering of the generated knowledge.
no code implementations • 28 Feb 2023 • Gadi Singer, Joscha Bach, Tetiana Grinberg, Nagib Hakim, Phillip Howard, Vasudev Lal, Zev Rivlin
While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures.
1 code implementation • 10 Feb 2023 • Avinash Madasu, Vasudev Lal
We compare the performance of language representations of stand-alone text encoders of these models to the language representations of text encoders learnt through vision supervision.
1 code implementation • 22 Oct 2022 • Phillip Howard, Gadi Singer, Vasudev Lal, Yejin Choi, Swabha Swayamdipta
While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge.
1 code implementation • 18 Oct 2022 • Phillip Howard, Arden Ma, Vasudev Lal, Ana Paula Simoes, Daniel Korat, Oren Pereg, Moshe Wasserblat, Gadi Singer
The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text.
no code implementations • 24 Aug 2022 • Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal
In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.
1 code implementation • 17 Jun 2022 • Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan
Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years.
1 code implementation • CVPR 2022 • Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal
Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems.
1 code implementation • Findings (NAACL) 2022 • Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan
Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.
no code implementations • EACL 2021 • Vasudev Lal, Arden Ma, Estelle Aflalo, Phillip Howard, Ana Simoes, Daniel Korat, Oren Pereg, Gadi Singer, Moshe Wasserblat
With the increasingly widespread use of Transformer-based models for NLU/NLP tasks, there is growing interest in understanding the inner workings of these models, why they are so effective at a wide range of tasks, and how they can be further tuned and improved.
Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)