21 code implementations • ICCV 2015 • Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh
Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.
no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra
Our approach produces a diverse set of plausible hypotheses for both semantic segmentation and prepositional phrase attachment resolution that are then jointly reranked to select the most consistent pair.
1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.
1 code implementation • EMNLP 2016 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh
Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).
no code implementations • 31 Aug 2016 • C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh
As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence.
no code implementations • 26 Apr 2017 • Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.
1 code implementation • CVPR 2018 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).
no code implementations • NeurIPS 2018 • Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee
Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.
no code implementations • 3 Dec 2018 • Aishwarya Agrawal, Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, tejas kulkarni
In this work, we study the setting in which an agent must learn to generate programs for diverse scenes conditioned on a given symbolic instruction.
no code implementations • 24 May 2022 • Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh
Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA).
1 code implementation • 13 Oct 2022 • Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal
Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks.
2 code implementations • 12 May 2023 • Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks, Aida Nematzadeh
While pretraining on large-scale image-text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, recent work has demonstrated that pretrained models lack "fine-grained" understanding, such as the ability to recognise relationships, verbs, and numbers in images.
Ranked #13 on Visual Reasoning on Winoground
1 code implementation • 24 May 2023 • Saba Ahmadi, Aishwarya Agrawal
Furthermore, we found that all metrics are sensitive to variations in the size of image-relevant objects mentioned in the caption, while CLIPScore and PAC-S are also sensitive to the number of mentions of image-relevant objects in the caption.
2 code implementations • 15 Jun 2023 • Le Zhang, Rabiul Awal, Aishwarya Agrawal
However, the compositional reasoning abilities of existing VLMs remains subpar.
1 code implementation • 16 Jun 2023 • Rabiul Awal, Le Zhang, Aishwarya Agrawal
In summary, our research sheds light on the intricacies of prompting strategies in VLMs for VQA, emphasizing the synergistic use of captions, templates, and pre-processing to enhance model efficacy.
no code implementations • 4 Oct 2023 • Oscar Mañas, Benno Krojer, Aishwarya Agrawal
Thus, there is a need to develop more robust automatic VQA metrics that serve as a proxy for human judgment.
1 code implementation • 20 Oct 2023 • Le Zhang, Yihong Wu, Fengran Mo, Jian-Yun Nie, Aishwarya Agrawal
To enable LLMs to tackle the task in a zero-shot manner, we introduce MoqaGPT, a straightforward and flexible framework.
no code implementations • 26 Mar 2024 • Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal
In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.
no code implementations • ACL 2022 • Aishwarya Agrawal, Damien Teney, Aida Nematzadeh
In addition to the larger pretraining datasets, the transformer architecture (Vaswani et al., 2017) and in particular self-attention applied to two modalities are responsible for the impressive performance of the recent pretrained models on downstream tasks (Hendricks et al., 2021).