Search Results for author: Michael Cogswell

Found 20 papers, 8 papers with code

BloomVQA: Assessing Hierarchical Multi-modal Comprehension

no code implementations20 Dec 2023 Yunye Gong, Robik Shrestha, Jared Claypoole, Michael Cogswell, Arijit Ray, Christopher Kanan, Ajay Divakaran

We propose a novel VQA dataset, BloomVQA, to facilitate comprehensive evaluation of large vision-language models on comprehension tasks.

Data Augmentation Memorization +2

A Video is Worth 10,000 Words: Training and Benchmarking with Diverse Captions for Better Long Video Retrieval

no code implementations30 Nov 2023 Matthew Gwilliam, Michael Cogswell, Meng Ye, Karan Sikka, Abhinav Shrivastava, Ajay Divakaran

To provide a more thorough evaluation of the capabilities of long video retrieval systems, we propose a pipeline that leverages state-of-the-art large language models to carefully generate a diverse set of synthetic captions for long videos.

Benchmarking Retrieval +2

DRESS: Instructing Large Vision-Language Models to Align and Interact with Humans via Natural Language Feedback

no code implementations16 Nov 2023 Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

The critique NLF identifies the strengths and weaknesses of the responses and is used to align the LVLMs with human preferences.

Language Modelling

Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models

1 code implementation8 Sep 2023 Yangyi Chen, Karan Sikka, Michael Cogswell, Heng Ji, Ajay Divakaran

Based on this pipeline and the existing coarse-grained annotated dataset, we build the CURE benchmark to measure both the zero-shot reasoning performance and consistency of VLMs.

Visual Reasoning

Probing Conceptual Understanding of Large Visual-Language Models

1 code implementation7 Apr 2023 Madeline Chantry Schiappa, Michael Cogswell, Ajay Divakaran, Yogesh Singh Rawat

In recent years large visual-language (V+L) models have achieved great success in various downstream tasks.

Benchmarking

Unpacking Large Language Models with Conceptual Consistency

no code implementations29 Sep 2022 Pritish Sahu, Michael Cogswell, Yunye Gong, Ajay Divakaran

The success of Large Language Models (LLMs) indicates they are increasingly able to answer queries like these accurately, but that ability does not necessarily imply a general understanding of concepts relevant to the anchor query.

Language Modelling Large Language Model

Trigger Hunting with a Topological Prior for Trojan Detection

1 code implementation ICLR 2022 Xiaoling Hu, Xiao Lin, Michael Cogswell, Yi Yao, Susmit Jha, Chao Chen

Despite their success and popularity, deep neural networks (DNNs) are vulnerable when facing backdoor attacks.

Improving Users' Mental Model with Attention-directed Counterfactual Edits

no code implementations13 Oct 2021 Kamran Alipour, Arijit Ray, Xiao Lin, Michael Cogswell, Jurgen P. Schulze, Yi Yao, Giedrius T. Burachas

In the domain of Visual Question Answering (VQA), studies have shown improvement in users' mental model of the VQA system when they are exposed to examples of how these systems answer certain Image-Question (IQ) pairs.

counterfactual Question Answering +2

Emergence of Compositional Language with Deep Generational Transmission

1 code implementation ICLR 2020 Michael Cogswell, Jiasen Lu, Stefan Lee, Devi Parikh, Dhruv Batra

In this paper, we introduce these cultural evolutionary dynamics into language emergence by periodically replacing agents in a population to create a knowledge gap, implicitly inducing cultural transmission of language.

Reinforcement Learning (RL)

Grad-CAM: Why did you say that?

2 code implementations22 Nov 2016 Ramprasaath R. Selvaraju, Abhishek Das, Ramakrishna Vedantam, Michael Cogswell, Devi Parikh, Dhruv Batra

We propose a technique for making Convolutional Neural Network (CNN)-based models more transparent by visualizing input regions that are 'important' for predictions -- or visual explanations.

Image Captioning Visual Question Answering

Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models

25 code implementations7 Oct 2016 Ashwin K. Vijayakumar, Michael Cogswell, Ramprasath R. Selvaraju, Qing Sun, Stefan Lee, David Crandall, Dhruv Batra

We observe that our method consistently outperforms BS and previously proposed techniques for diverse decoding from neural sequence models.

Image Captioning Machine Translation +4

Stochastic Multiple Choice Learning for Training Diverse Deep Ensembles

no code implementations NeurIPS 2016 Stefan Lee, Senthil Purushwalkam, Michael Cogswell, Viresh Ranjan, David Crandall, Dhruv Batra

Many practical perception systems exist within larger processes that include interactions with users or additional components capable of evaluating the quality of predicted solutions.

Multiple-choice

Why M Heads are Better than One: Training a Diverse Ensemble of Deep Networks

no code implementations19 Nov 2015 Stefan Lee, Senthil Purushwalkam, Michael Cogswell, David Crandall, Dhruv Batra

Convolutional Neural Networks have achieved state-of-the-art performance on a wide range of tasks.

Combining the Best of Graphical Models and ConvNets for Semantic Segmentation

no code implementations14 Dec 2014 Michael Cogswell, Xiao Lin, Senthil Purushwalkam, Dhruv Batra

We present a two-module approach to semantic segmentation that incorporates Convolutional Networks (CNNs) and Graphical Models.

Segmentation Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.