Search Results for author: Kushal Kafle

Found 22 papers, 12 papers with code

FairDeDup: Detecting and Mitigating Vision-Language Fairness Disparities in Semantic Dataset Deduplication

no code implementations • 24 Apr 2024 • Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle

Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset.

Paper
Add Code

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

no code implementations • 23 Apr 2024 • Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo

To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.

Hallucination In-Context Learning +2

Paper
Add Code

SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data

1 code implementation • 24 Aug 2023 • Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez

To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens.

Object Relation

Paper
Code

Improving Visual Grounding by Encouraging Consistent Gradient-based Explanations

1 code implementation • CVPR 2023 • Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez

We propose a margin-based loss for tuning joint vision-language models so that their gradient-based explanations are consistent with region-level annotations provided by humans for relatively smaller grounding datasets.

Language Modelling Referring Expression +2

Paper
Code

OccamNets: Mitigating Dataset Bias by Favoring Simpler Hypotheses

1 code implementation • 5 Apr 2022 • Robik Shrestha, Kushal Kafle, Christopher Kanan

We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias.

Ranked #3 on Action Recognition on BAR

Action Recognition

Paper
Code

Beyond Pixels: A Sample Based Method for understanding the decisions of Neural Networks

no code implementations • 29 Sep 2021 • Ohi Dibua, Mackenzie Austin, Kushal Kafle

This can manifest itself in the use of spurious correlations in the data to make decisions.

Paper
Add Code

Learning to Predict Visual Attributes in the Wild

no code implementations • CVPR 2021 • Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava

In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.

Attribute Contrastive Learning +2

Paper
Add Code

Are Bias Mitigation Techniques for Deep Learning Effective?

1 code implementation • 1 Apr 2021 • Robik Shrestha, Kushal Kafle, Christopher Kanan

We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources.

Question Answering Visual Question Answering

Paper
Code

AESOP: Abstract Encoding of Stories, Objects, and Pictures

2 code implementations • ICCV 2021 • Hareesh Ravi, Kushal Kafle, Scott Cohen, Jonathan Brandt, Mubbasir Kapadia

Visual storytelling and story comprehension are uniquely human skills that play a central role in how we learn about and experience the world.

Story Completion Visual Storytelling

Paper
Code

On the Value of Out-of-Distribution Testing: An Example of Goodhart's Law

no code implementations • NeurIPS 2020 • Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton Van Den Hengel

Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set.

Model Selection Question Answering +1

Paper
Add Code

Do We Need Fully Connected Output Layers in Convolutional Networks?

no code implementations • 28 Apr 2020 • Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan

Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification.

General Classification

Paper
Add Code

Visual Grounding Methods for VQA are Working for the Wrong Reasons!

1 code implementation • ACL 2020 • Robik Shrestha, Kushal Kafle, Christopher Kanan

Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons.

Question Answering Visual Grounding +1

Paper
Code

REMIND Your Neural Network to Prevent Catastrophic Forgetting

1 code implementation • ECCV 2020 • Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, Christopher Kanan

While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutional networks replay raw images.

Quantization Question Answering +1

Paper
Code

Answering Questions about Data Visualizations using Efficient Bimodal Fusion

1 code implementation • 5 Aug 2019 • Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan

Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e. g. bar charts, pie charts, and line graphs.

Ranked #1 on Visual Question Answering (VQA) on DVQA test-familiar

Chart Question Answering Optical Character Recognition +3

Paper
Code

Challenges and Prospects in Vision and Language Research

no code implementations • 19 Apr 2019 • Kushal Kafle, Robik Shrestha, Christopher Kanan

Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence.

Natural Language Understanding

Paper
Add Code

Answer Them All! Toward Universal Visual Question Answering Models

2 code implementations • CVPR 2019 • Robik Shrestha, Kushal Kafle, Christopher Kanan

Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning.

Question Answering Visual Question Answering

Paper
Code

TallyQA: Answering Complex Counting Questions

1 code implementation • 29 Oct 2018 • Manoj Acharya, Kushal Kafle, Christopher Kanan

Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection.

Ranked #3 on Object Counting on HowMany-QA

Attribute Object Counting +5

Paper
Code

DVQA: Understanding Data Visualizations via Question Answering

1 code implementation • CVPR 2018 • Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan

Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them.

Chart Question Answering Question Answering +1

Paper
Code

Data Augmentation for Visual Question Answering

no code implementations • WS 2017 • Kushal Kafle, Mohammed Yousefhussien, Christopher Kanan

Data augmentation is widely used to train deep neural networks for image classification tasks.

Data Augmentation General Classification +4

Paper
Add Code

An Analysis of Visual Question Answering Algorithms

no code implementations • ICCV 2017 • Kushal Kafle, Christopher Kanan

As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods.

Question Answering Visual Question Answering

Paper
Add Code

Visual Question Answering: Datasets, Algorithms, and Future Challenges

1 code implementation • 5 Oct 2016 • Kushal Kafle, Christopher Kanan

We then exhaustively review existing algorithms for VQA.

Question Answering Visual Question Answering

Paper
Code

Answer-Type Prediction for Visual Question Answering

no code implementations • CVPR 2016 • Kushal Kafle, Christopher Kanan

Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued.

Object Recognition Question Answering +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.