1 code implementation • 20 Feb 2024 • Jianrui Zhang, Mu Cai, Tengyang Xie, Yong Jae Lee
We first spotlight the near-chance performance of multimodal models like CLIP and LLaVA in physically grounded compositional reasoning.
counterfactual Data Augmentation +2