Search Results for author: Wei Chow

Found 9 papers, 5 papers with code

An Empirical Study of GPT-4o Image Generation Capabilities

1 code implementation8 Apr 2025 Sixiang Chen, Jinbin Bai, Zhuoran Zhao, Tian Ye, Qingyu Shi, Donghao Zhou, Wenhao Chai, Xin Lin, Jianzong Wu, Chao Tang, Shilin Xu, Tao Zhang, Haobo Yuan, Yikang Zhou, Wei Chow, Linfeng Li, Xiangtai Li, Lei Zhu, Lu Qi

The landscape of image generation has rapidly evolved, from early GAN-based approaches to diffusion models and, most recently, to unified generative architectures that seek to bridge understanding and generation tasks.

Benchmarking Image Generation +3

PhysBench: Benchmarking and Enhancing Vision-Language Models for Physical World Understanding

no code implementations27 Jan 2025 Wei Chow, Jiageng Mao, Boyi Li, Daniel Seita, Vitor Guizilini, Yue Wang

While Vision-Language Models (VLMs) have shown great promise in reasoning and task planning for embodied agents, their ability to comprehend physical phenomena remains extremely limited.

Benchmarking Common Sense Reasoning +2

KAA: Kolmogorov-Arnold Attention for Enhancing Attentive Graph Neural Networks

1 code implementation23 Jan 2025 Taoran Fang, Tianhong Gao, Chunping Wang, Yihao Shang, Wei Chow, Lei Chen, Yang Yang

Graph neural networks (GNNs) with attention mechanisms, often referred to as attentive GNNs, have emerged as a prominent paradigm in advanced GNN models in recent years.

HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing

1 code implementation5 Dec 2024 Jinbin Bai, Wei Chow, Ling Yang, Xiangtai Li, Juncheng Li, Hanwang Zhang, Shuicheng Yan

HumanEdit bridges this gap by employing human annotators to construct data pairs and administrators to provide feedback.

Unified Generative and Discriminative Training for Multi-modal Large Language Models

no code implementations1 Nov 2024 Wei Chow, Juncheng Li, Qifan Yu, Kaihang Pan, Hao Fei, Zhiqi Ge, Shuai Yang, Siliang Tang, Hanwang Zhang, Qianru Sun

Discriminative training, exemplified by models like CLIP, excels in zero-shot image-text classification and retrieval, yet struggles with complex scenarios requiring fine-grained semantic differentiation.

Dynamic Time Warping Image-text Classification +5

Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis

1 code implementation10 Oct 2024 Jinbin Bai, Tian Ye, Wei Chow, Enxin Song, Qing-Guo Chen, Xiangtai Li, Zhen Dong, Lei Zhu, Shuicheng Yan

We present Meissonic, which elevates non-autoregressive masked image modeling (MIM) text-to-image to a level comparable with state-of-the-art diffusion models like SDXL.

Feature Compression Image Generation

Auto-Encoding Morph-Tokens for Multimodal LLM

1 code implementation3 May 2024 Kaihang Pan, Siliang Tang, Juncheng Li, Zhaoyu Fan, Wei Chow, Shuicheng Yan, Tat-Seng Chua, Yueting Zhuang, Hanwang Zhang

For multimodal LLMs, the synergy of visual comprehension (textual output) and generation (visual output) presents an ongoing challenge.

Image Reconstruction MORPH

Enhancing Cross-domain Link Prediction via Evolution Process Modeling

no code implementations3 Feb 2024 Xuanwen Huang, Wei Chow, Yize Zhu, Yang Wang, Ziwei Chai, Chunping Wang, Lei Chen, Yang Yang

Extensive experiments on eight untrained graphs demonstrate that DyExpert achieves state-of-the-art performance in cross-domain link prediction.

Dynamic Link Prediction Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.