no code implementations • 25 Mar 2024 • Zhuowan Li, Bhavan Jasani, Peng Tang, Shabnam Ghadar
In particular, our approach improves the accuracy of the previous state-of-the-art approach from 38% to 54% on the human-written questions in the ChartQA dataset, which needs strong reasoning.
no code implementations • 9 Dec 2023 • Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang
We propose Causal Context Generation, Causal-CoG, which is a prompting strategy that engages contextual information to enhance precise VQA during inference.
2 code implementations • NeurIPS 2023 • Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille
In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes.
no code implementations • 1 Dec 2022 • Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille
Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.
2 code implementations • CVPR 2023 • Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille
Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization.
1 code implementation • NAACL 2022 • Chenyu Zhang, Benjamin Van Durme, Zhuowan Li, Elias Stengel-Eskin
Our commonsense knowledge about objects includes their typical visual attributes; we know that bananas are typically yellow or green, and not purple.
Ranked #1 on Visual Commonsense Tests on ViComTe-color
1 code implementation • CVPR 2022 • Vipul Gupta, Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan Yuille
By swapping the context object features, the model reliance on context can be suppressed effectively.
1 code implementation • ICCV 2021 • Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille
Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images.
no code implementations • CVPR 2020 • Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan Yuille
In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images.
2 code implementations • NeurIPS 2018 • Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li
Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.
Ranked #3 on Person Re-Identification on CUHK03