Search Results for author: Zhuowan Li

Found 10 papers, 6 papers with code

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

no code implementations • 25 Mar 2024 • Zhuowan Li, Bhavan Jasani, Peng Tang, Shabnam Ghadar

In particular, our approach improves the accuracy of the previous state-of-the-art approach from 38% to 54% on the human-written questions in the ChartQA dataset, which needs strong reasoning.

Data Augmentation Question Answering +1

Paper
Add Code

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

no code implementations • 9 Dec 2023 • Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang

We propose Causal Context Generation, Causal-CoG, which is a prompting strategy that engages contextual information to enhance precise VQA during inference.

Question Answering Visual Question Answering

Paper
Add Code

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

2 code implementations • NeurIPS 2023 • Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille

In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes.

Question Answering Visual Question Answering

Paper
Code

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

no code implementations • 1 Dec 2022 • Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.

Attribute Representation Learning

Paper
Add Code

Super-CLEVR: A Virtual Benchmark to Diagnose Domain Robustness in Visual Reasoning

2 code implementations • CVPR 2023 • Zhuowan Li, Xingrui Wang, Elias Stengel-Eskin, Adam Kortylewski, Wufei Ma, Benjamin Van Durme, Alan Yuille

Visual Question Answering (VQA) models often perform poorly on out-of-distribution data and struggle on domain generalization.

Domain Generalization Question Answering +2

Paper
Code

Visual Commonsense in Pretrained Unimodal and Multimodal Models

1 code implementation • NAACL 2022 • Chenyu Zhang, Benjamin Van Durme, Zhuowan Li, Elias Stengel-Eskin

Our commonsense knowledge about objects includes their typical visual attributes; we know that bananas are typically yellow or green, and not purple.

Ranked #1 on Visual Commonsense Tests on ViComTe-color

Attribute Visual Commonsense Tests +1

Paper
Code

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

1 code implementation • CVPR 2022 • Vipul Gupta, Zhuowan Li, Adam Kortylewski, Chenyu Zhang, Yingwei Li, Alan Yuille

By swapping the context object features, the model reliance on context can be suppressed effectively.

Data Augmentation Question Answering +1

Paper
Code

Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images

1 code implementation • ICCV 2021 • Zhuowan Li, Elias Stengel-Eskin, Yixiao Zhang, Cihang Xie, Quan Tran, Benjamin Van Durme, Alan Yuille

Our experiments show CCO substantially boosts the performance of neural symbolic methods on real images.

Question Answering Visual Question Answering

Paper
Code

Context-Aware Group Captioning via Self-Attention and Contrastive Features

no code implementations • CVPR 2020 • Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan Yuille

In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images.

Image Captioning

Paper
Add Code

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

2 code implementations • NeurIPS 2018 • Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.

Ranked #3 on Person Re-Identification on CUHK03

Generative Adversarial Network Person Re-Identification

1,268

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.