no code implementations • 9 Aug 2024 • Jian Lu, Shikhar Srivastava, Junyu Chen, Robik Shrestha, Manoj Acharya, Kushal Kafle, Christopher Kanan
With the advent of multi-modal large language models (MLLMs), datasets used for visual question answering (VQA) and referring expression comprehension have seen a resurgence.
no code implementations • 17 Jun 2024 • Salma Abdel Magid, Jui-Hsien Wang, Kushal Kafle, Hanspeter Pfister
Vision Language Models (VLMs) such as CLIP are powerful models; however they can exhibit unwanted biases, making them less safe when deployed directly in applications such as text-to-image, text-to-video retrievals, reverse search, or classification tasks.
no code implementations • CVPR 2024 • Eric Slyman, Stefan Lee, Scott Cohen, Kushal Kafle
Recent dataset deduplication techniques have demonstrated that content-aware dataset pruning can dramatically reduce the cost of training Vision-Language Pretrained (VLP) models without significant performance losses compared to training on the original dataset.
no code implementations • 23 Apr 2024 • Hang Hua, Jing Shi, Kushal Kafle, Simon Jenni, Daoan Zhang, John Collomosse, Scott Cohen, Jiebo Luo
To address this, we propose FineMatch, a new aspect-based fine-grained text and image matching benchmark, focusing on text and image mismatch detection and correction.
no code implementations • CVPR 2024 • Sepehr Sameni, Kushal Kafle, Hao Tan, Simon Jenni
Recent advancements in Vision-Language Models (VLMs) have marked a significant leap in bridging the gap between computer vision and natural language processing.
1 code implementation • 24 Aug 2023 • Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez
To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens.
1 code implementation • CVPR 2023 • Ziyan Yang, Kushal Kafle, Franck Dernoncourt, Vicente Ordonez
We propose a margin-based loss for tuning joint vision-language models so that their gradient-based explanations are consistent with region-level annotations provided by humans for relatively smaller grounding datasets.
1 code implementation • 5 Apr 2022 • Robik Shrestha, Kushal Kafle, Christopher Kanan
We propose a new direction: modifying the network architecture to impose inductive biases that make the network robust to dataset bias.
Ranked #3 on Action Recognition on BAR
no code implementations • 29 Sep 2021 • Ohi Dibua, Mackenzie Austin, Kushal Kafle
This can manifest itself in the use of spurious correlations in the data to make decisions.
no code implementations • CVPR 2021 • Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, Abhinav Shrivastava
In this paper, we introduce a large-scale in-the-wild visual attribute prediction dataset consisting of over 927K attribute annotations for over 260K object instances.
1 code implementation • 1 Apr 2021 • Robik Shrestha, Kushal Kafle, Christopher Kanan
We introduce a new dataset called Biased MNIST that enables assessment of robustness to multiple bias sources.
2 code implementations • ICCV 2021 • Hareesh Ravi, Kushal Kafle, Scott Cohen, Jonathan Brandt, Mubbasir Kapadia
Visual storytelling and story comprehension are uniquely human skills that play a central role in how we learn about and experience the world.
no code implementations • NeurIPS 2020 • Damien Teney, Kushal Kafle, Robik Shrestha, Ehsan Abbasnejad, Christopher Kanan, Anton Van Den Hengel
Out-of-distribution (OOD) testing is increasingly popular for evaluating a machine learning system's ability to generalize beyond the biases of a training set.
no code implementations • 28 Apr 2020 • Zhongchao Qian, Tyler L. Hayes, Kushal Kafle, Christopher Kanan
Traditionally, deep convolutional neural networks consist of a series of convolutional and pooling layers followed by one or more fully connected (FC) layers to perform the final classification.
1 code implementation • ACL 2020 • Robik Shrestha, Kushal Kafle, Christopher Kanan
Existing Visual Question Answering (VQA) methods tend to exploit dataset biases and spurious statistical correlations, instead of producing right answers for the right reasons.
1 code implementation • ECCV 2020 • Tyler L. Hayes, Kushal Kafle, Robik Shrestha, Manoj Acharya, Christopher Kanan
While there is neuroscientific evidence that the brain replays compressed memories, existing methods for convolutional networks replay raw images.
1 code implementation • 5 Aug 2019 • Kushal Kafle, Robik Shrestha, Brian Price, Scott Cohen, Christopher Kanan
Chart question answering (CQA) is a newly proposed visual question answering (VQA) task where an algorithm must answer questions about data visualizations, e. g. bar charts, pie charts, and line graphs.
no code implementations • 19 Apr 2019 • Kushal Kafle, Robik Shrestha, Christopher Kanan
Language grounded image understanding tasks have often been proposed as a method for evaluating progress in artificial intelligence.
2 code implementations • CVPR 2019 • Robik Shrestha, Kushal Kafle, Christopher Kanan
Visual Question Answering (VQA) research is split into two camps: the first focuses on VQA datasets that require natural image understanding and the second focuses on synthetic datasets that test reasoning.
1 code implementation • 29 Oct 2018 • Manoj Acharya, Kushal Kafle, Christopher Kanan
Most counting questions in visual question answering (VQA) datasets are simple and require no more than object detection.
Ranked #3 on Object Counting on HowMany-QA
1 code implementation • CVPR 2018 • Kushal Kafle, Brian Price, Scott Cohen, Christopher Kanan
Bar charts are an effective way to convey numeric information, but today's algorithms cannot parse them.
no code implementations • WS 2017 • Kushal Kafle, Mohammed Yousefhussien, Christopher Kanan
Data augmentation is widely used to train deep neural networks for image classification tasks.
no code implementations • ICCV 2017 • Kushal Kafle, Christopher Kanan
As a result, evaluation scores are inflated and predominantly determined by answering easier questions, making it difficult to compare different methods.
1 code implementation • 5 Oct 2016 • Kushal Kafle, Christopher Kanan
We then exhaustively review existing algorithms for VQA.
no code implementations • CVPR 2016 • Kushal Kafle, Christopher Kanan
Recently, algorithms for object recognition and related tasks have become sufficiently proficient that new vision tasks can now be pursued.