Search Results for author: Zhuowan Li

Found 10 papers, 6 papers with code

Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA

no code implementations25 Mar 2024 Zhuowan Li, Bhavan Jasani, Peng Tang, Shabnam Ghadar

In particular, our approach improves the accuracy of the previous state-of-the-art approach from 38% to 54% on the human-written questions in the ChartQA dataset, which needs strong reasoning.

Data Augmentation Question Answering +1

Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models

no code implementations9 Dec 2023 Shitian Zhao, Zhuowan Li, Yadong Lu, Alan Yuille, Yan Wang

We propose Causal Context Generation, Causal-CoG, which is a prompting strategy that engages contextual information to enhance precise VQA during inference.

Question Answering Visual Question Answering

3D-Aware Visual Question Answering about Parts, Poses and Occlusions

2 code implementations NeurIPS 2023 Xingrui Wang, Wufei Ma, Zhuowan Li, Adam Kortylewski, Alan Yuille

In this work, we introduce the task of 3D-aware VQA, which focuses on challenging questions that require a compositional reasoning over the 3D structure of visual scenes.

Question Answering Visual Question Answering

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

no code implementations1 Dec 2022 Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality.

Attribute Representation Learning

Visual Commonsense in Pretrained Unimodal and Multimodal Models

1 code implementation NAACL 2022 Chenyu Zhang, Benjamin Van Durme, Zhuowan Li, Elias Stengel-Eskin

Our commonsense knowledge about objects includes their typical visual attributes; we know that bananas are typically yellow or green, and not purple.

Attribute Visual Commonsense Tests +1

Context-Aware Group Captioning via Self-Attention and Contrastive Features

no code implementations CVPR 2020 Zhuowan Li, Quan Tran, Long Mai, Zhe Lin, Alan Yuille

In this paper, we introduce a new task, context-aware group captioning, which aims to describe a group of target images in the context of another group of related reference images.

Image Captioning

FD-GAN: Pose-guided Feature Distilling GAN for Robust Person Re-identification

2 code implementations NeurIPS 2018 Yixiao Ge, Zhuowan Li, Haiyu Zhao, Guojun Yin, Shuai Yi, Xiaogang Wang, Hongsheng Li

Our proposed FD-GAN achieves state-of-the-art performance on three person reID datasets, which demonstrates that the effectiveness and robust feature distilling capability of the proposed FD-GAN.

Generative Adversarial Network Person Re-Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.