Search Results for author: Vasudev Lal

Found 22 papers, 14 papers with code

Opinion-based Relational Pivoting for Cross-domain Aspect Term Extraction

no code implementations • WASSA (ACL) 2022 • Ayal Klein, Oren Pereg, Daniel Korat, Vasudev Lal, Moshe Wasserblat, Ido Dagan

In this paper, we investigate and establish empirically a prior conjecture, which suggests that the linguistic relations connecting opinion terms to their aspects transfer well across domains and therefore can be leveraged for cross-domain aspect term extraction.

Domain Adaptation Term Extraction

Paper
Add Code

LVLM-Intrepret: An Interpretability Tool for Large Vision-Language Models

no code implementations • 3 Apr 2024 • Gabriela Ben Melech Stan, Raanan Yehezkel Rohekar, Yaniv Gurwicz, Matthew Lyle Olson, Anahita Bhiwandiwalla, Estelle Aflalo, Chenfei Wu, Nan Duan, Shao-Yen Tseng, Vasudev Lal

In this work, we present a novel interactive application aimed towards understanding the internal mechanisms of large vision-language models.

Language Modelling

Paper
Add Code

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 code implementation • 1 Apr 2024 • Agneet Chatterjee, Gabriela Ben Melech Stan, Estelle Aflalo, Sayak Paul, Dhruba Ghosh, Tejas Gokhale, Ludwig Schmidt, Hannaneh Hajishirzi, Vasudev Lal, Chitta Baral, Yezhou Yang

One of the key shortcomings in current text-to-image (T2I) models is their inability to consistently generate images which faithfully follow the spatial relationships specified in the text prompt.

Paper
Code

LLaVA-Gemma: Accelerating Multimodal Foundation Models with a Compact Language Model

no code implementations • 29 Mar 2024 • Musashi Hinck, Matthew L. Olson, David Cobbley, Shao-Yen Tseng, Vasudev Lal

We train a suite of multimodal foundation models (MMFM) using the popular LLaVA framework with the recently released Gemma family of large language models (LLMs).

Language Modelling

Paper
Add Code

SocialCounterfactuals: Probing and Mitigating Intersectional Social Biases in Vision-Language Models with Counterfactual Examples

1 code implementation • 30 Nov 2023 • Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Anahita Bhiwandiwalla, Vasudev Lal

Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs that are highly similar in their depiction of a subject (e. g., a given occupation) while differing only in their depiction of intersectional social attributes (e. g., race & gender).

counterfactual

Paper
Code

NeuroPrompts: An Adaptive Framework to Optimize Prompts for Text-to-Image Generation

1 code implementation • 20 Nov 2023 • Shachar Rosenman, Vasudev Lal, Phillip Howard

In this work, we present NeuroPrompts, an adaptive framework that automatically enhances a user's prompt to improve the quality of generations produced by text-to-image models.

Language Modelling Prompt Engineering +1

Paper
Code

LDM3D-VR: Latent Diffusion Model for 3D VR

no code implementations • 6 Nov 2023 • Gabriela Ben Melech Stan, Diana Wofk, Estelle Aflalo, Shao-Yen Tseng, Zhipeng Cai, Michael Paulitsch, Vasudev Lal

Our models are fine-tuned from existing pretrained models on datasets containing panoramic/high-resolution RGB images, depth maps and captions.

Paper
Add Code

Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks

1 code implementation • 7 Oct 2023 • Avinash Madasu, Anahita Bhiwandiwalla, Vasudev Lal

We investigate 9 foundational image-text models on a diverse set of video tasks that include video action recognition (video AR), video retrieval (video RT), video question answering (video QA), video multiple choice (video MC) and video captioning (video CP).

Action Recognition Multiple-choice +6

Paper
Code

Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples

no code implementations • 4 Oct 2023 • Phillip Howard, Avinash Madasu, Tiep Le, Gustavo Lujan Moreno, Vasudev Lal

While vision-language models (VLMs) have achieved remarkable performance improvements recently, there is growing evidence that these models also posses harmful biases with respect to social attributes such as gender and race.

counterfactual

Paper
Add Code

ICSVR: Investigating Compositional and Syntactic Understanding in Video Retrieval Models

1 code implementation • 28 Jun 2023 • Avinash Madasu, Vasudev Lal

The study is performed on two categories of video retrieval models: (i) which are pre-trained on video-text pairs and fine-tuned on downstream video retrieval datasets (Eg.

Retrieval Video Retrieval +1

Paper
Code

ManagerTower: Aggregating the Insights of Uni-Modal Experts for Vision-Language Representation Learning

1 code implementation • 31 May 2023 • Xiao Xu, Bei Li, Chenfei Wu, Shao-Yen Tseng, Anahita Bhiwandiwalla, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

With only 4M VLP data, ManagerTower achieves superior performances on various downstream VL tasks, especially 79. 15% accuracy on VQAv2 Test-Std, 86. 56% IR@1 and 95. 64% TR@1 on Flickr30K.

Representation Learning

Paper
Code

LDM3D: Latent Diffusion Model for 3D

2 code implementations • 18 May 2023 • Gabriela Ben Melech Stan, Diana Wofk, Scottie Fox, Alex Redden, Will Saxton, Jean Yu, Estelle Aflalo, Shao-Yen Tseng, Fabio Nonato, Matthias Muller, Vasudev Lal

This research paper proposes a Latent Diffusion Model for 3D (LDM3D) that generates both image and depth map data from a given text prompt, allowing users to generate RGBD images from text prompts.

4,173

Paper
Code

NeuroComparatives: Neuro-Symbolic Distillation of Comparative Knowledge

1 code implementation • 8 May 2023 • Phillip Howard, Junlin Wang, Vasudev Lal, Gadi Singer, Yejin Choi, Swabha Swayamdipta

We introduce NeuroComparatives, a novel framework for comparative knowledge distillation overgenerated from language models such as GPT-variants and LLaMA, followed by stringent filtering of the generated knowledge.

Knowledge Distillation valid +1

Paper
Code

Thrill-K Architecture: Towards a Solution to the Problem of Knowledge Based Understanding

no code implementations • 28 Feb 2023 • Gadi Singer, Joscha Bach, Tetiana Grinberg, Nagib Hakim, Phillip Howard, Vasudev Lal, Zev Rivlin

While end-to-end learning systems are rapidly gaining capabilities and popularity, the increasing computational demands for deploying such systems, along with a lack of flexibility, adaptability, explainability, reasoning and verification capabilities, require new types of architectures.

Paper
Add Code

Is Multimodal Vision Supervision Beneficial to Language?

1 code implementation • 10 Feb 2023 • Avinash Madasu, Vasudev Lal

We compare the performance of language representations of stand-alone text encoders of these models to the language representations of text encoders learnt through vision supervision.

Image Retrieval Natural Language Understanding +4

Paper
Code

NeuroCounterfactuals: Beyond Minimal-Edit Counterfactuals for Richer Data Augmentation

1 code implementation • 22 Oct 2022 • Phillip Howard, Gadi Singer, Vasudev Lal, Yejin Choi, Swabha Swayamdipta

While counterfactual data augmentation offers a promising step towards robust generalization in natural language processing, producing a set of counterfactuals that offer valuable inductive bias for models remains a challenge.

counterfactual Data Augmentation +4

Paper
Code

Cross-Domain Aspect Extraction using Transformers Augmented with Knowledge Graphs

1 code implementation • 18 Oct 2022 • Phillip Howard, Arden Ma, Vasudev Lal, Ana Paula Simoes, Daniel Korat, Oren Pereg, Moshe Wasserblat, Gadi Singer

The extraction of aspect terms is a critical step in fine-grained sentiment analysis of text.

Aspect Extraction Knowledge Graphs +2

2,928

Paper
Code

MuMUR : Multilingual Multimodal Universal Retrieval

no code implementations • 24 Aug 2022 • Avinash Madasu, Estelle Aflalo, Gabriela Ben Melech Stan, Shachar Rosenman, Shao-Yen Tseng, Gedas Bertasius, Vasudev Lal

In this paper, we propose a framework MuMUR, that utilizes knowledge transfer from a multilingual model to boost the performance of multi-modal (image and video) retrieval.

Image Retrieval Machine Translation +3

Paper
Add Code

BridgeTower: Building Bridges Between Encoders in Vision-Language Representation Learning

1 code implementation • 17 Jun 2022 • Xiao Xu, Chenfei Wu, Shachar Rosenman, Vasudev Lal, Wanxiang Che, Nan Duan

Vision-Language (VL) models with the Two-Tower architecture have dominated visual-language representation learning in recent years.

Representation Learning

137

Paper
Code

VL-InterpreT: An Interactive Visualization Tool for Interpreting Vision-Language Transformers

1 code implementation • CVPR 2022 • Estelle Aflalo, Meng Du, Shao-Yen Tseng, Yongfei Liu, Chenfei Wu, Nan Duan, Vasudev Lal

Breakthroughs in transformer-based models have revolutionized not only the NLP field, but also vision and multimodal systems.

Question Answering Visual Commonsense Reasoning +1

Paper
Code

KD-VLP: Improving End-to-End Vision-and-Language Pretraining with Object Knowledge Distillation

1 code implementation • Findings (NAACL) 2022 • Yongfei Liu, Chenfei Wu, Shao-Yen Tseng, Vasudev Lal, Xuming He, Nan Duan

Self-supervised vision-and-language pretraining (VLP) aims to learn transferable multi-modal representations from large-scale image-text data and to achieve strong performances on a broad scope of vision-language tasks after finetuning.

Knowledge Distillation Object +1

Paper
Code

InterpreT: An Interactive Visualization Tool for Interpreting Transformers

no code implementations • EACL 2021 • Vasudev Lal, Arden Ma, Estelle Aflalo, Phillip Howard, Ana Simoes, Daniel Korat, Oren Pereg, Gadi Singer, Moshe Wasserblat

With the increasingly widespread use of Transformer-based models for NLU/NLP tasks, there is growing interest in understanding the inner workings of these models, why they are so effective at a wide range of tasks, and how they can be further tuned and improved.

Aspect-Based Sentiment Analysis Aspect-Based Sentiment Analysis (ABSA)

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.