Search Results for author: Aishwarya Agrawal

Found 20 papers, 10 papers with code

VQA: Visual Question Answering

21 code implementations • ICCV 2015 • Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh

Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.

Ranked #1 on Visual Question Answering (VQA) on COCO Visual Question Answering (VQA) real images 2.0 open ended

Image Captioning Multiple-choice +1

1,425

Paper
Code

Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes

no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra

Our approach produces a diverse set of plausible hypotheses for both semantic segmentation and prepositional phrase attachment resolution that are then jointly reranked to select the most consistent pair.

Common Sense Reasoning Prepositional Phrase Attachment +3

Paper
Add Code

Visual Storytelling

1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell

We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.

Descriptive Visual Storytelling

Paper
Code

Analyzing the Behavior of Visual Question Answering Models

1 code implementation • EMNLP 2016 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).

Question Answering Visual Question Answering

218

Paper
Code

Measuring Machine Intelligence Through Visual Question Answering

no code implementations • 31 Aug 2016 • C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh

As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence.

Image Captioning Question Answering +1

Paper
Add Code

Resolving Language and Vision Ambiguities Together: Joint Segmentation \& Prepositional Attachment Resolution in Captioned Scenes

no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra

Common Sense Reasoning Prepositional Phrase Attachment +1

Paper
Add Code

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

no code implementations • 26 Apr 2017 • Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh

Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.

Question Answering Visual Question Answering

Paper
Add Code

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

1 code implementation • CVPR 2018 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).

Question Answering Visual Question Answering

Paper
Code

Overcoming Language Priors in Visual Question Answering with Adversarial Regularization

no code implementations • NeurIPS 2018 • Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee

Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.

Question Answering Visual Grounding +1

Paper
Add Code

Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning

no code implementations • 3 Dec 2018 • Aishwarya Agrawal, Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, tejas kulkarni

In this work, we study the setting in which an agent must learn to generate programs for diverse scenes conditioned on a given symbolic instruction.

Paper
Add Code

Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

no code implementations • 24 May 2022 • Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh

Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA).

Image Captioning Out-of-Distribution Generalization +3

Paper
Add Code

MAPL: Parameter-Efficient Adaptation of Unimodal Pre-Trained Models for Vision-Language Few-Shot Prompting

1 code implementation • 13 Oct 2022 • Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal

Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks.

Image Captioning Question Answering +1

Paper
Code

Measuring Progress in Fine-grained Vision-and-Language Understanding

2 code implementations • 12 May 2023 • Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks, Aida Nematzadeh

While pretraining on large-scale image-text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, recent work has demonstrated that pretrained models lack "fine-grained" understanding, such as the ability to recognise relationships, verbs, and numbers in images.

Ranked #13 on Visual Reasoning on Winoground

Visual Reasoning

Paper
Code

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics

1 code implementation • 24 May 2023 • Saba Ahmadi, Aishwarya Agrawal

Furthermore, we found that all metrics are sensitive to variations in the size of image-relevant objects mentioned in the caption, while CLIPScore and PAC-S are also sensitive to the number of mentions of image-relevant objects in the caption.

Image Captioning Negation +2

Paper
Code

Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Compositional Understanding

2 code implementations • 15 Jun 2023 • Le Zhang, Rabiul Awal, Aishwarya Agrawal

However, the compositional reasoning abilities of existing VLMs remains subpar.

Contrastive Learning Image Classification +5

Paper
Code

Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering

1 code implementation • 16 Jun 2023 • Rabiul Awal, Le Zhang, Aishwarya Agrawal

In summary, our research sheds light on the intricacies of prompting strategies in VLMs for VQA, emphasizing the synergistic use of captions, templates, and pre-processing to enhance model efficacy.

Image Captioning Question Answering +1

Paper
Code

Improving Automatic VQA Evaluation Using Large Language Models

no code implementations • 4 Oct 2023 • Oscar Mañas, Benno Krojer, Aishwarya Agrawal

Thus, there is a need to develop more robust automatic VQA metrics that serve as a proxy for human judgment.

In-Context Learning Question Answering +1

Paper
Add Code

MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model

1 code implementation • 20 Oct 2023 • Le Zhang, Yihong Wu, Fengran Mo, Jian-Yun Nie, Aishwarya Agrawal

To enable LLMs to tackle the task in a zero-shot manner, we introduce MoqaGPT, a straightforward and flexible framework.

Language Modelling Large Language Model +2

Paper
Code

Improving Text-to-Image Consistency via Automatic Prompt Optimization

no code implementations • 26 Mar 2024 • Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.

Language Modelling Large Language Model

Paper
Add Code

Vision-Language Pretraining: Current Trends and the Future

no code implementations • ACL 2022 • Aishwarya Agrawal, Damien Teney, Aida Nematzadeh

In addition to the larger pretraining datasets, the transformer architecture (Vaswani et al., 2017) and in particular self-attention applied to two modalities are responsible for the impressive performance of the recent pretrained models on downstream tasks (Hendricks et al., 2021).

Question Answering Representation Learning +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.