1 code implementation • 15 Apr 2025 • Ziqi Pang, Xin Xu, Yu-Xiong Wang
With the success of image generation, generative diffusion models are increasingly adopted for discriminative tasks, as pixel generation provides a unified perception interface.
no code implementations • 14 Apr 2025 • Aruna Gauba, Irene Pi, Yunze Man, Ziqi Pang, Vikram S. Adve, Yu-Xiong Wang
We curate a dataset AgMMU for evaluating and developing vision-language models (VLMs) to produce factually accurate answers for knowledge-intensive expert domains.
1 code implementation • CVPR 2025 • Lang Lin, Xueyang Yu, Ziqi Pang, Yu-Xiong Wang
This paper proposes a novel framework utilizing multi-modal large language models (MLLMs) for referring video object segmentation (RefVOS).
Ranked #3 on
Referring Video Object Segmentation
on MeViS
no code implementations • CVPR 2025 • Ziqi Pang, Tianyuan Zhang, Fujun Luan, Yunze Man, Hao Tan, Kai Zhang, William T. Freeman, Yu-Xiong Wang
We introduce RandAR, a decoder-only visual autoregressive (AR) model capable of generating images in arbitrary token orders.
1 code implementation • 9 Oct 2024 • Bowen Jin, Ziqi Pang, Bingjun Guo, Yu-Xiong Wang, Jiaxuan You, Jiawei Han
In this paper, we approach an overlooked yet critical task Graph2Image: generating images from multimodal attributed graphs (MMAGs).
no code implementations • CVPR 2024 • Junbao Zhou, Ziqi Pang, Yu-Xiong Wang
With recent video object segmentation (VOS) benchmarks evolving to challenging scenarios, we revisit a simple but overlooked strategy: restricting the size of memory banks.
2 code implementations • 19 Oct 2023 • Ziqi Pang, Ziyang Xie, Yunze Man, Yu-Xiong Wang
This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language.
Ranked #4 on
Question Answering
on SQA3D
1 code implementation • 2 Oct 2023 • Ziqi Pang, Deva Ramanan, Mengtian Li, Yu-Xiong Wang
Our benchmark inherently captures the disappearance and re-appearance of agents, presenting the emergent challenge of forecasting for occluded agents, which is a safety-critical problem yet overlooked by snapshot-based benchmarks.
1 code implementation • ICCV 2023 • Ziyang Xie, Ziqi Pang, Yu-Xiong Wang
To further enhance multi-view consistency, we augment the uncertainty network with the global 3D structure optimized by a voxelized neural radiance field (Voxel-NeRF).
1 code implementation • CVPR 2023 • Ziqi Pang, Jie Li, Pavel Tokmakov, Dian Chen, Sergey Zagoruyko, Yu-Xiong Wang
It emphasizes spatio-temporal continuity and integrates both past and future reasoning for tracked objects.
2 code implementations • CVPR 2022 • Lue Fan, Ziqi Pang, Tianyuan Zhang, Yu-Xiong Wang, Hang Zhao, Feng Wang, Naiyan Wang, Zhaoxiang Zhang
In LiDAR-based 3D object detection for autonomous driving, the ratio of the object size to input scene size is significantly smaller compared to 2D detection cases.
Ranked #3 on
3D Object Detection
on waymo cyclist
1 code implementation • 26 Nov 2021 • Qitai Wang, Yuntao Chen, Ziqi Pang, Naiyan Wang, Zhaoxiang Zhang
We employ a simple Kalman filter for trajectory prediction and preserve the tracklet by prediction when the target is not visible.
2 code implementations • 18 Nov 2021 • Ziqi Pang, Zhichao Li, Naiyan Wang
3D multi-object tracking (MOT) has witnessed numerous novel benchmarks and approaches in recent years, especially those under the "tracking-by-detection" paradigm.
Ranked #34 on
3D Multi-Object Tracking
on nuScenes
4 code implementations • 10 Mar 2021 • Ziqi Pang, Zhichao Li, Naiyan Wang
The code and protocols for our benchmark and algorithm are available at https://github. com/TuSimple/LiDAR_SOT/.
1 code implementation • 29 Nov 2019 • Ziqi Pang, Zhiyuan Hu, Pavel Tokmakov, Yu-Xiong Wang, Martial Hebert
Indeed, even the majority of few-shot learning methods rely on a large set of "base classes" for pretraining.