no code implementations • 26 May 2025 • Peng Liu, Xiaoming Ren, Fengkai Liu, Qingsong Xie, Quanlong Zheng, Yanhao Zhang, Haonan Lu, Yujiu Yang
Recent advancements in image-to-video (I2V) generation have shown promising performance in conventional scenarios.
no code implementations • 22 May 2025 • Shuhao Han, Haotian Fan, Fangyuan Kong, Wenjie Liao, Chunle Guo, Chongyi Li, Radu Timofte, Liang Li, Tao Li, Junhui Cui, Yunqiu Wang, Yang Tai, Jingwei Sun, Jianhui Sun, Xinli Yue, Tianyi Wang, Huan Hou, Junda Lu, Xinyang Huang, Zitang Zhou, Zijian Zhang, Xuhui Zheng, Xuecheng Wu, Chong Peng, Xuezhi Cao, Trong-Hieu Nguyen-Mau, Minh-Hoang Le, Minh-Khoa Le-Phan, Duy-Nam Ly, Hai-Dang Nguyen, Minh-Triet Tran, Yukang Lin, Yan Hong, Chuanbiao Song, Siyuan Li, Jun Lan, Zhichao Zhang, Xinyue Li, Wei Sun, ZiCheng Zhang, Yunhao Li, Xiaohong Liu, Guangtao Zhai, Zitong Xu, Huiyu Duan, Jiarui Wang, Guangji Ma, Liu Yang, Lu Liu, Qiang Hu, Xiongkuo Min, Zichuan Wang, Zhenchen Tang, Bo Peng, Jing Dong, Fengbin Guan, Zihao Yu, Yiting Lu, Wei Luo, Xin Li, Minhao Lin, Haofeng Chen, Xuanxuan He, Kele Xu, Qisheng Xu, Zijian Gao, Tianjiao Wan, Bo-Cheng Qiu, Chih-Chung Hsu, Chia-Ming Lee, Yu-Fan Lin, Bo Yu, Zehao Wang, Da Mu, Mingxiu Chen, Junkang Fang, Huamei Sun, Wending Zhao, Zhiyu Wang, Wang Liu, Weikang Yu, Puhong Duan, Bin Sun, Xudong Kang, Shutao Li, Shuai He, Lingzhi Fu, Heng Cong, Rongyu Zhang, Jiarong He, Zhishan Qiao, Yongqing Huang, Zewen Chen, Zhe Pang, Juan Wang, Jian Guo, Zhizhuo Shao, Ziyu Feng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Zeming Liu, Qingsong Xie, Ruichen Wang, Zhihao LI, Yuqi Liang, Jianqi Bi, Jun Luo, Junfeng Yang, Can Li, Jing Fu, Hongwei Xu, Mingrui Long, Lulin Tang
A total of 211 participants have registered in the structure track.
1 code implementation • 1 Apr 2025 • Zhenyi Liao, Qingsong Xie, Yanhao Zhang, Zijian Kong, Haonan Lu, Zhenyu Yang, Zhijie Deng
Increasing attention has been placed on improving the reasoning capacities of multi-modal large language models (MLLMs).
1 code implementation • CVPR 2025 • Siyuan Li, Luyuan Zhang, Zedong Wang, Juanxi Tian, Cheng Tan, Zicheng Liu, Chang Yu, Qingsong Xie, Haonan Lu, Haoqian Wang, Zhen Lei
Masked Image Modeling (MIM) with Vector Quantization (VQ) has achieved great success in both self-supervised pre-training and image generation.
no code implementations • 31 Mar 2025 • Qi Wu, Quanlong Zheng, Yanhao Zhang, Junlin Xie, Jinguo Luo, Kuo Wang, Peng Liu, Qingsong Xie, Ru Zhen, Haonan Lu, Zhenyu Yang
To tackle this challenge, we propose a hierarchical and holistic video understanding (H2VU) benchmark designed to evaluate both general video and online streaming video comprehension.
no code implementations • 11 Mar 2025 • Qingsong Xie, Zhao Zhang, Zhe Huang, Yanhao Zhang, Haonan Lu, Zhenyu Yang
Experiments demonstrate Layton's superiority in high-fidelity reconstruction, with 10. 8 reconstruction Frechet Inception Distance on MSCOCO-2017 5K benchmark for 1024x1024 image reconstruction.
no code implementations • 2 Dec 2024 • Ruichen Wang, Junliang Zhang, Qingsong Xie, Chen Chen, Haonan Lu
Recently, diffusion models have exhibited superior performance in the area of image inpainting.
1 code implementation • 24 Jun 2024 • Zhenyi Liao, Qingsong Xie, Chen Chen, Hannan Lu, Zhijie Deng
Targeting addressing such an issue, we first assess the face quality of generations from popular pre-trained DMs with the aid of human annotators and then evaluate the alignment between existing metrics with human judgments.
1 code implementation • 9 Jun 2024 • Qingsong Xie, Zhenyi Liao, Zhijie Deng, Chen Chen, Haonan Lu
Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.
1 code implementation • 28 May 2024 • Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang
In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks.
no code implementations • 17 Apr 2024 • Dingkun Zhang, Sijia Li, Chen Chen, Qingsong Xie, Haonan Lu
To this end, we proposed the layer pruning and normalized distillation for compressing diffusion models (LAPTOP-Diff).
no code implementations • 3 Mar 2024 • Hongjian Liu, Qingsong Xie, Zhijie Deng, Chen Chen, Shixiang Tang, Fueyang Fu, Zheng-Jun Zha, Haonan Lu
In contrast to vanilla consistency distillation (CD) which distills the ordinary differential equation solvers-based sampling process of a pretrained teacher model into a student, SCott explores the possibility and validates the efficacy of integrating stochastic differential equation (SDE) solvers into CD to fully unleash the potential of the teacher.
1 code implementation • 28 Nov 2023 • Jian Ma, Chen Chen, Qingsong Xie, Haonan Lu
In this paper, we are inspired to propose a simple plug-and-play language transfer method based on knowledge distillation.
Cross-lingual Text-to-Image Generation
Knowledge Distillation
+2
1 code implementation • CVPR 2024 • Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan
This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve images according to the given image or language instructions.
1 code implementation • CVPR 2023 • Shixiang Tang, Cheng Chen, Qingsong Xie, Meilin Chen, Yizhou Wang, Yuanzheng Ci, Lei Bai, Feng Zhu, Haiyang Yang, Li Yi, Rui Zhao, Wanli Ouyang
Specifically, we propose a \textbf{HumanBench} based on existing datasets to comprehensively evaluate on the common ground the generalization abilities of different pretraining methods on 19 datasets from 6 diverse downstream tasks, including person ReID, pose estimation, human parsing, pedestrian attribute recognition, pedestrian detection, and crowd counting.
Ranked #1 on
Pedestrian Attribute Recognition
on PA-100K
(using extra training data)