1 code implementation • 30 Oct 2024 • Apoorv Khandelwal, Tian Yun, Nihal V. Nayak, Jack Merullo, Stephen H. Bach, Chen Sun, Ellie Pavlick
We introduce a benchmark to measure the time to pre-train models on given GPUs and also identify ideal settings for maximizing training speed.
1 code implementation • 10 Nov 2023 • Apoorv Khandelwal, Ellie Pavlick, Chen Sun
Modular neural networks without additional training have recently been shown to surpass end-to-end neural networks on challenging vision-language tasks.
1 code implementation • 13 Oct 2022 • Eric Ming Chen, Jin Sun, Apoorv Khandelwal, Dani Lischinski, Noah Snavely, Hadar Averbuch-Elor
How can one visually characterize people in a decade?
1 code implementation • 3 Jun 2022 • Dustin Schwenk, Apoorv Khandelwal, Christopher Clark, Kenneth Marino, Roozbeh Mottaghi
In contrast to the existing knowledge-based VQA datasets, the questions generally cannot be answered by simply querying a knowledge base, and instead require some form of commonsense reasoning about the scene depicted in the image.
2 code implementations • CVPR 2022 • Apoorv Khandelwal, Luca Weihs, Roozbeh Mottaghi, Aniruddha Kembhavi
Contrastive language image pretraining (CLIP) encoders have been shown to be beneficial for a range of visual tasks from classification and detection to captioning and image manipulation.
1 code implementation • ICCV 2021 • Claire Yuqing Cui, Apoorv Khandelwal, Yoav Artzi, Noah Snavely, Hadar Averbuch-Elor
We present a task and benchmark dataset for person-centric visual grounding, the problem of linking between people named in a caption and people pictured in an image.
Ranked #1 on
Person-centric Visual Grounding
on Who’s Waldo
(using extra training data)
no code implementations • 27 Nov 2020 • Margot Hanley, Apoorv Khandelwal, Hadar Averbuch-Elor, Noah Snavely, Helen Nissenbaum
Important ethical concerns arising from computer vision datasets of people have been receiving significant attention, and a number of datasets have been withdrawn as a result.