Search Results for author: Kushal Kafle

Found 25 papers, 12 papers with code

Revisiting Multi-Modal LLM Evaluation

no code implementations9 Aug 2024 Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan

With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence.

Chart Understanding Optical Character Recognition +4

They're All Doctors: Synthesizing Diverse Counterfactuals to Mitigate Associative Bias

no code implementations17 Jun 2024 Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister

Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks.

counterfactual Fairness +1

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

no code implementations CVPR 2024 Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle

Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset.

Fairness

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

no code implementations23 Apr 2024 Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.

Hallucination In-Context Learning +2

Building Vision-Language Models on Solid Foundations with Masked Distillation

no code implementations CVPR 2024 Sepehr Sameni, Kushal Kafle, Hao Tan, Simon Jenni

Recent advancements in Vision-Language Models (VLMs) have marked a significant leap in bridging the gap between computer vision and natural language processing.

Contrastive Learning Knowledge Distillation +4

SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data

1 code implementation24 Aug 2023 Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez

To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens.

Object Relation

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

1 code implementation CVPR 2023 Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez

We propose a margin-based loss for tuning joint vision-language models so that their gradient-based explanations are consistent with region-level annotations provided by humans for relatively smaller grounding datasets.

Language Modelling Referring Expression +2

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses

1 code implementation5 Apr 2022 Robik Shrestha, Kushal Kafle, Christopher Kanan

We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias.

Action Recognition

Beyond Pixels: A Sample Based Method for understanding the decisions of Neural Networks

no code implementations29 Sep 2021 Ohi Dibua, Mackenzie Austin, Kushal Kafle

This can manifest itself in the use of spurious correlations in the data to make decisions.

Learning to Predict Visual Attributes in the Wild

no code implementations CVPR 2021 Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava

In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.

Attribute Contrastive Learning +2

Are Bias Mitigation Techniques for Deep Learning Effective?

1 code implementation1 Apr 2021 Robik Shrestha, Kushal Kafle, Christopher Kanan

We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources.

Deep Learning Question Answering +1

AESOP: Abstract Encoding of Stories, Objects, and Pictures

2 code implementations ICCV 2021 Hareesh Ravi, Kushal Kafle, Scott Cohen, Jonathan Brandt, Mubbasir Kapadia

Visual storytelling and story comprehension are uniquely human skills that play a central role in how we learn about and experience the world.

Story Completion Visual Storytelling

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

no code implementations NeurIPS 2020 Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton Van Den Hengel

Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set.

Model Selection Question Answering +1

Do We Need Fully Connected Output Layers in Convolutional Networks?

no code implementations28 Apr 2020 Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan

Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification.

General Classification

Visual Grounding Methods for VQA are Working for the Wrong Reasons!

1 code implementation ACL 2020 Robik Shrestha, Kushal Kafle, Christopher Kanan

Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons.

Question Answering Visual Grounding +1

REMIND Your Neural Network to Prevent Catastrophic Forgetting

1 code implementation ECCV 2020 Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, Christopher Kanan

While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutional networks replay raw images.

Quantization Question Answering +1

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

1 code implementation5 Aug 2019 Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e. g. bar charts, pie charts, and line graphs.

Chart Question Answering Optical Character Recognition +3

Challenges and Prospects in Vision and Language Research

no code implementations19 Apr 2019 Kushal Kafle, Robik Shrestha, Christopher Kanan

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence.

Natural Language Understanding

Answer Them All! Toward Universal Visual Question Answering Models

2 code implementations CVPR 2019 Robik Shrestha, Kushal Kafle, Christopher Kanan

Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning.

Question Answering Visual Question Answering

TallyQA: Answering Complex Counting Questions

1 code implementation29 Oct 2018 Manoj Acharya, Kushal Kafle, Christopher Kanan

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection.

Attribute Object Counting +5

An Analysis of Visual Question Answering Algorithms

no code implementations ICCV 2017 Kushal Kafle, Christopher Kanan

As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods.

Question Answering Visual Question Answering

Answer-Type Prediction for Visual Question Answering

no code implementations CVPR 2016 Kushal Kafle, Christopher Kanan

Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued.

Object Recognition Question Answering +3

Cannot find the paper you are looking for? You can Submit a new open access paper.