no code implementations • 26 Mar 2025 • Jinnan Chen, Lingting Zhu, Zeyu Hu, Shengju Qian, Yugang Chen, Xin Wang, Gim Hee Lee
Recent advances in auto-regressive transformers have revolutionized generative modeling across different domains, from language processing to visual generation, demonstrating remarkable capabilities.
no code implementations • 24 Mar 2025 • Lingting Zhu, Jingrui Ye, Runze Zhang, Zeyu Hu, Yingda Yin, Lanjiong Li, Jinnan Chen, Shengju Qian, Xin Wang, Qingmin Liao, Lequan Yu
Current methods for 3D generation still fall short in physically based rendering (PBR) texturing, primarily due to limited data and challenges in modeling multi-channel materials.
no code implementations • 25 Jun 2024 • Lin Liu, Quande Liu, Shengju Qian, Yuan Zhou, Wengang Zhou, Houqiang Li, Lingxi Xie, Qi Tian
Video generation is a challenging yet pivotal task in various industries, such as gaming, e-commerce, and advertising.
1 code implementation • 23 Apr 2024 • Xuanhua He, Quande Liu, Shengju Qian, Xin Wang, Tao Hu, Ke Cao, Keyu Yan, Jie Zhang
In this study, we present \textbf{ID-Animator}, a zero-shot human-video generation approach that can perform personalized video generation given a single reference facial image without further training.
1 code implementation • 25 Mar 2024 • Hao Shao, Shengju Qian, Han Xiao, Guanglu Song, Zhuofan Zong, Letian Wang, Yu Liu, Hongsheng Li
To address these challenges, we collect and introduce the large-scale Visual CoT dataset comprising 438k question-answer pairs, annotated with intermediate bounding boxes highlighting key regions essential for answering the questions.
1 code implementation • CVPR 2024 • Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia
Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.
4 code implementations • 21 Sep 2023 • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.
1 code implementation • 15 Apr 2023 • Jingyao Li, Pengguang Chen, Shengju Qian, Shu Liu, Jiaya Jia
Contrastive Language-Image Pre-training (CLIP) has recently shown great promise in pixel-level zero-shot learning tasks.
no code implementations • 1 Mar 2023 • Shengju Qian, Huiwen Chang, Yuanzhen Li, Zizhao Zhang, Jiaya Jia, Han Zhang
We propose Stratified Image Transformer(StraIT), a pure non-autoregressive(NAR) generative model that demonstrates superiority in high-quality image synthesis over existing autoregressive(AR) and diffusion models(DMs).
no code implementations • 21 Dec 2022 • Shengju Qian, Yi Zhu, Wenbo Li, Mu Li, Jiaya Jia
The architecture of transformers, which recently witness booming applications in vision tasks, has pivoted against the widespread convolutional paradigm.
1 code implementation • 19 Dec 2021 • Wenbo Li, Xin Lu, Shengju Qian, Jiangbo Lu, Xiangyu Zhang, Jiaya Jia
Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems.
Ranked #11 on
Image Super-Resolution
on Set5 - 2x upscaling
(using extra training data)
no code implementations • NeurIPS 2021 • Shengju Qian, Hao Shao, Yi Zhu, Mu Li, Jiaya Jia
In this work, we analyze the uncharted problem of aliasing in vision transformer and explore to incorporate anti-aliasing properties.
4 code implementations • 17 Jan 2020 • Hao Shao, Shengju Qian, Yu Liu
In this way, a heavy temporal model is replaced by a simple interlacing operator.
no code implementations • ICCV 2019 • Shengju Qian, Kwan-Yee Lin, Wayne Wu, Yangxiaokang Liu, Quan Wang, Fumin Shen, Chen Qian, Ran He
Recent studies have shown remarkable success in face manipulation task with the advance of GANs and VAEs paradigms, but the outputs are sometimes limited to low-resolution and lack of diversity.
1 code implementation • ICCV 2019 • Shengju Qian, Keqiang Sun, Wayne Wu, Chen Qian, Jiaya Jia
Facial landmark detection, or face alignment, is a fundamental task that has been extensively studied.
Ranked #18 on
Face Alignment
on WFLW