no code implementations • ACL (RepL4NLP) 2021 • Pritish Sahu, Michael Cogswell, Ajay Divakaran, Sara Rutherford-Quach
Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge.
no code implementations • 20 Dec 2023 • Yunye Gong, Robik Shrestha, Jared Claypoole, Michael Cogswell, Arijit Ray, Christopher Kanan, Ajay Divakaran
We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks.
no code implementations • 30 Nov 2023 • Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran
We use synthetic captions from this pipeline to perform a benchmark of a representative set of video language models using long video datasets, and show that the models struggle on shorter captions.
no code implementations • CVPR 2024 • Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran
The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences.
1 code implementation • 8 Sep 2023 • Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran
Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs.
1 code implementation • 7 Apr 2023 • Madeline Schiappa, Raiyaan Abdullah, Shehreen Azad, Jared Claypoole, Michael Cogswell, Ajay Divakaran, Yogesh Rawat
In this work we focus on conceptual understanding of these large V+L models.
no code implementations • 29 Sep 2022 • Pritish Sahu, Michael Cogswell, Yunye Gong, Ajay Divakaran
The success of Large Language Models (LLMs) indicates they are increasingly able to answer queries like these accurately, but that ability does not necessarily imply a general understanding of concepts relevant to the anchor query.
1 code implementation • ICLR 2022 • Xiaoling Hu, Xiao Lin, Michael Cogswell, Yi Yao, Susmit Jha, Chao Chen
Despite their success and popularity, deep neural networks (DNNs) are vulnerable when facing backdoor attacks.
no code implementations • 13 Oct 2021 • Kamran Alipour, Arijit Ray, Xiao Lin, Michael Cogswell, Jurgen P. Schulze, Yi Yao, Giedrius T. Burachas
In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs.
no code implementations • 8 Jun 2021 • Pritish Sahu, Michael Cogswell, Sara Rutherford-Quach, Ajay Divakaran
Current pre-trained language models have lots of knowledge, but a more limited ability to use that knowledge.
no code implementations • 26 Mar 2021 • Arijit Ray, Michael Cogswell, Xiao Lin, Kamran Alipour, Ajay Divakaran, Yi Yao, Giedrius Burachas
Hence, we propose Error Maps that clarify the error by highlighting image regions where the model is prone to err.
1 code implementation • NeurIPS 2020 • Michael Cogswell, Jiasen Lu, Rishabh Jain, Stefan Lee, Devi Parikh, Dhruv Batra
Can we develop visually grounded dialog agents that can efficiently adapt to new tasks without forgetting how to talk to people?
1 code implementation • ICLR 2020 • Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra
In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language.
2 code implementations • 22 Nov 2016 • Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra
We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.
126 code implementations • ICCV 2017 • Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra
For captioning and VQA, we show that even non-attention based models can localize inputs.
Ranked #2 on
Image Attribution
on CUB-200-2011
25 code implementations • 7 Oct 2016 • Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra
We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.
no code implementations • NeurIPS 2016 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra
Many practical perception systems exist within larger processes that include interactions with users or additional components capable of evaluating the quality of predicted solutions.
no code implementations • 19 Nov 2015 • Michael Cogswell, Faruk Ahmed, Ross Girshick, Larry Zitnick, Dhruv Batra
One major challenge in training Deep Neural Networks is preventing overfitting.
no code implementations • 19 Nov 2015 • Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra
Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks.
no code implementations • 14 Dec 2014 • Michael Cogswell, Xiao Lin, Senthil Purushwalkam, Dhruv Batra
We present a two-module approach to semantic segmentation that incorporates Convolutional Networks (CNNs) and Graphical Models.