Search Results for author: Aishwarya Agrawal

Found 21 papers, 10 papers with code

Vision-Language Pretraining: Current Trends and the Future

no code implementations ACL 2022 Aishwarya Agrawal, Damien Teney, Aida Nematzadeh

In addition to the larger pretraining datasets, the transformer architecture (Vaswani et al., 2017) and in particular self-attention applied to two modalities are responsible for the impressive performance of the recent pretrained models on downstream tasks (Hendricks et al., 2021).

Question Answering Representation Learning +1

Improving Text-to-Image Consistency via Automatic Prompt Optimization

no code implementations26 Mar 2024 Oscar Mañas, Pietro Astolfi, Melissa Hall, Candace Ross, Jack Urbanek, Adina Williams, Aishwarya Agrawal, Adriana Romero-Soriano, Michal Drozdzal

In this paper, we address these challenges and introduce a T2I optimization-by-prompting framework, OPT2I, which leverages a large language model (LLM) to improve prompt-image consistency in T2I models.

Language Modelling Large Language Model

MoqaGPT : Zero-Shot Multi-modal Open-domain Question Answering with Large Language Model

1 code implementation20 Oct 2023 Le Zhang, Yihong Wu, Fengran Mo, Jian-Yun Nie, Aishwarya Agrawal

To enable LLMs to tackle the task in a zero-shot manner, we introduce MoqaGPT, a straightforward and flexible framework.

Language Modelling Large Language Model +2

Improving Automatic VQA Evaluation Using Large Language Models

no code implementations4 Oct 2023 Oscar Mañas, Benno Krojer, Aishwarya Agrawal

Thus, there is a need to develop more robust automatic VQA metrics that serve as a proxy for human judgment.

In-Context Learning Question Answering +1

Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering

1 code implementation16 Jun 2023 Rabiul Awal, Le Zhang, Aishwarya Agrawal

In summary, our research sheds light on the intricacies of prompting strategies in VLMs for VQA, emphasizing the synergistic use of captions, templates, and pre-processing to enhance model efficacy.

Image Captioning Question Answering +1

An Examination of the Robustness of Reference-Free Image Captioning Evaluation Metrics

1 code implementation24 May 2023 Saba Ahmadi, Aishwarya Agrawal

Furthermore, we found that all metrics are sensitive to variations in the size of image-relevant objects mentioned in the caption, while CLIPScore and PAC-S are also sensitive to the number of mentions of image-relevant objects in the caption.

Image Captioning Negation +2

Measuring Progress in Fine-grained Vision-and-Language Understanding

2 code implementations12 May 2023 Emanuele Bugliarello, Laurent Sartran, Aishwarya Agrawal, Lisa Anne Hendricks, Aida Nematzadeh

While pretraining on large-scale image-text data from the Web has facilitated rapid progress on many vision-and-language (V&L) tasks, recent work has demonstrated that pretrained models lack "fine-grained" understanding, such as the ability to recognise relationships, verbs, and numbers in images.

Visual Reasoning

Reassessing Evaluation Practices in Visual Question Answering: A Case Study on Out-of-Distribution Generalization

no code implementations24 May 2022 Aishwarya Agrawal, Ivana Kajić, Emanuele Bugliarello, Elnaz Davoodi, Anita Gergely, Phil Blunsom, Aida Nematzadeh

Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks such as image captioning and visual question answering (VQA).

Image Captioning Out-of-Distribution Generalization +3

Generating Diverse Programs with Instruction Conditioned Reinforced Adversarial Learning

no code implementations3 Dec 2018 Aishwarya Agrawal, Mateusz Malinowski, Felix Hill, Ali Eslami, Oriol Vinyals, tejas kulkarni

In this work, we study the setting in which an agent must learn to generate programs for diverse scenes conditioned on a given symbolic instruction.

Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering

1 code implementation CVPR 2018 Aishwarya Agrawal, Dhruv Batra, Devi Parikh, Aniruddha Kembhavi

Specifically, we present new splits of the VQA v1 and VQA v2 datasets, which we call Visual Question Answering under Changing Priors (VQA-CP v1 and VQA-CP v2 respectively).

Question Answering Visual Question Answering

C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset

no code implementations26 Apr 2017 Aishwarya Agrawal, Aniruddha Kembhavi, Dhruv Batra, Devi Parikh

Finally, we evaluate several existing VQA models under this new setting and show that the performances of these models degrade by a significant amount compared to the original VQA setting.

Question Answering Visual Question Answering

Analyzing the Behavior of Visual Question Answering Models

1 code implementation EMNLP 2016 Aishwarya Agrawal, Dhruv Batra, Devi Parikh

Recently, a number of deep-learning based models have been proposed for the task of Visual Question Answering (VQA).

Question Answering Visual Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.