Search Results for author: Vasudev Lal

Found 22 papers, 14 papers with code

Opinion-based Relational Pivoting for Cross-domain Aspect Term Extraction

no code implementations WASSA (ACL) 2022 Ayal Klein, Oren Pereg, Daniel Korat, Vasudev Lal, Moshe Wasserblat, Ido Dagan

In this paper, we investigate and establish empirically a prior conjecture, which suggests that the linguistic relations connecting opinion terms to their aspects transfer well across domains and therefore can be leveraged for cross-domain aspect term extraction.

Domain Adaptation Term Extraction

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation1 Apr 2024 Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

no code implementations29 Mar 2024 Musashi Hinck, Matthew L. Olson, David Cobbley, Shao-Yen Tseng, Vasudev Lal

We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs).

Language Modelling

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

1 code implementation30 Nov 2023 Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal

Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e. g., a given occupation) while differing only in their depiction of intersectional social attributes (e. g., race & gender).

counterfactual

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

1 code implementation20 Nov 2023 Shachar Rosenman, Vasudev Lal, Phillip Howard

In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models.

Language Modelling Prompt Engineering +1

LDM3D-VR: Latent Diffusion Model for 3D VR

no code implementations6 Nov 2023 Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions.

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

1 code implementation7 Oct 2023 Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal

We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP).

Action Recognition Multiple-choice +6

Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples

no code implementations4 Oct 2023 Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Vasudev Lal

While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race.

counterfactual

ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models

1 code implementation28 Jun 2023 Avinash Madasu, Vasudev Lal

The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg.

Retrieval Video Retrieval +1

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

1 code implementation31 May 2023 Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.

Representation Learning

LDM3D: Latent Diffusion Model for 3D

2 code implementations18 May 2023 Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.

NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge

1 code implementation8 May 2023 Phillip Howard, Junlin Wang, Vasudev Lal, Gadi Singer, Yejin Choi, Swabha Swayamdipta

We introduce NeuroComparatives, a novel framework for comparative knowledge distillation overgenerated from language models such as GPT-variants and LLaMA, followed by stringent filtering of the generated knowledge.

Knowledge Distillation valid +1

Thrill-K Architecture: Towards a Solution to the Problem of Knowledge Based Understanding

no code implementations28 Feb 2023 Gadi Singer, Joscha Bach, Tetiana Grinberg, Nagib Hakim, Phillip Howard, Vasudev Lal, Zev Rivlin

While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures.

Is Multimodal Vision Supervision Beneficial to Language?

1 code implementation10 Feb 2023 Avinash Madasu, Vasudev Lal

We compare the performance of language representations of stand-alone text encoders of these models to the language representations of text encoders learnt through vision supervision.

Image Retrieval Natural Language Understanding +4

NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation

1 code implementation22 Oct 2022 Phillip Howard, Gadi Singer, Vasudev Lal, Yejin Choi, Swabha Swayamdipta

While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge.

counterfactual Data Augmentation +4

MuMUR : Multilingual Multimodal Universal Retrieval

no code implementations24 Aug 2022 Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.

Image Retrieval Machine Translation +3

BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

1 code implementation17 Jun 2022 Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years.

Representation Learning

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation Findings (NAACL) 2022 Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Object +1

InterpreT: An Interactive Visualization Tool for Interpreting Transformers

no code implementations EACL 2021 Vasudev Lal, Arden Ma, Estelle Aflalo, Phillip Howard, Ana Simoes, Daniel Korat, Oren Pereg, Gadi Singer, Moshe Wasserblat

With the increasingly widespread use of Transformer-based models for NLU/NLP tasks, there is growing interest in understanding the inner workings of these models, why they are so effective at a wide range of tasks, and how they can be further tuned and improved.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

Cannot find the paper you are looking for? You can Submit a new open access paper.