1 code implementation • 26 Sep 2024 • Amita Kamath, Cheng-Yu Hsieh, Kai-Wei Chang, Ranjay Krishna
By training with both, we see improvements on existing benchmarks while simultaneously improving performance on hard positives, indicating a more robust improvement in compositionality.
1 code implementation • 29 May 2024 • WenBo Hu, Zi-Yi Dou, Liunian Harold Li, Amita Kamath, Nanyun Peng, Kai-Wei Chang
This raises the question: can we achieve flexibility in the number of visual tokens to suit different tasks and computational resources?
1 code implementation • 30 Oct 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang
Recent vision-language (VL) models are powerful, but can they reliably distinguish "right" from "left"?
1 code implementation • 24 May 2023 • Amita Kamath, Jack Hessel, Kai-Wei Chang
We first curate CompPrompts, a set of increasingly compositional image captions that VL models should be able to capture (e. g., single object, to object+property, to multiple interacting objects).
1 code implementation • 28 Mar 2023 • Adyasha Maharana, Amita Kamath, Christopher Clark, Mohit Bansal, Aniruddha Kembhavi
As general purpose vision models get increasingly effective at a wide set of tasks, it is imperative that they be consistent across the tasks they support.
no code implementations • 4 Feb 2022 • Amita Kamath, Christopher Clark, Tanmay Gupta, Eric Kolve, Derek Hoiem, Aniruddha Kembhavi
This work presents an effective and inexpensive alternative: learn skills from supervised datasets, learn concepts from web image search, and leverage a key characteristic of GPVs: the ability to transfer visual knowledge across skills.
Ranked #2 on
Visual Question Answering (VQA)
on GRIT
no code implementations • CVPR 2022 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem
To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.
2 code implementations • 1 Apr 2021 • Tanmay Gupta, Amita Kamath, Aniruddha Kembhavi, Derek Hoiem
To reduce the time and expertise required to develop new applications, we would like to create general purpose vision systems that can learn and perform a range of tasks without any modification to the architecture or learning process.
2 code implementations • ACL 2020 • Amita Kamath, Robin Jia, Percy Liang
In this work, we propose the setting of selective question answering under domain shift, in which a QA model is tested on a mixture of in-domain and out-of-domain data, and must answer (i. e., not abstain on) as many questions as possible while maintaining high accuracy.