no code implementations • ACL 2022 • Aishwarya Agrawal, Damien Teney, Aida Nematzadeh
In addition to the larger pretraining datasets, the transformer architecture (Vaswani et al., 2017) and in particular self-attention applied to two modalities are responsible for the impressive performance of the recent pretrained models on downstream tasks (Hendricks et al., 2021).
no code implementations • 13 Oct 2022 • Oscar Mañas, Pau Rodriguez, Saba Ahmadi, Aida Nematzadeh, Yash Goyal, Aishwarya Agrawal
Large pre-trained models have proved to be remarkable zero- and (prompt-based) few-shot learners in unimodal vision and language tasks.
no code implementations • 24 May 2022 • Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh
Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA).
no code implementations • 3 Dec 2018 • Aishwarya Agrawal, Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, tejas kulkarni
In this work, we study the setting in which an agent must learn to generate programs for diverse scenes conditioned on a given symbolic instruction.
no code implementations • NeurIPS 2018 • Sainandan Ramakrishnan, Aishwarya Agrawal, Stefan Lee
Further, on standard VQA tasks, our approach shows significantly less drop in accuracy compared to existing bias-reducing VQA models.
1 code implementation • CVPR 2018 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi
Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).
no code implementations • 26 Apr 2017 • Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh
Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.
no code implementations • 31 Aug 2016 • C. Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, Devi Parikh
As machines have become more intelligent, there has been a renewed interest in methods for measuring their intelligence.
1 code implementation • EMNLP 2016 • Aishwarya Agrawal, Dhruv Batra, Devi Parikh
Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).
1 code implementation • NAACL 2016 • Ting-Hao, Huang, Francis Ferraro, Nasrin Mostafazadeh, Ishan Misra, Aishwarya Agrawal, Jacob Devlin, Ross Girshick, Xiaodong He, Pushmeet Kohli, Dhruv Batra, C. Lawrence Zitnick, Devi Parikh, Lucy Vanderwende, Michel Galley, Margaret Mitchell
We introduce the first dataset for sequential vision-to-language, and explore how this data may be used for the task of visual storytelling.
no code implementations • EMNLP 2016 • Gordon Christie, Ankit Laddha, Aishwarya Agrawal, Stanislaw Antol, Yash Goyal, Kevin Kochersberger, Dhruv Batra
Our approach produces a diverse set of plausible hypotheses for both semantic segmentation and prepositional phrase attachment resolution that are then jointly reranked to select the most consistent pair.
20 code implementations • ICCV 2015 • Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Dhruv Batra, Devi Parikh
Given an image and a natural language question about the image, the task is to provide an accurate natural language answer.