no code implementations • 27 Aug 2024 • Zejia Weng, Xitong Yang, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
In this paper, we aim to investigate whether such priors derived from a generative process are suitable for video recognition, and eventually joint optimization of generation and recognition.
no code implementations • 13 Jun 2024 • Miaosen Zhang, Yixuan Wei, Zhen Xing, Yifei Ma, Zuxuan Wu, Ji Li, Zheng Zhang, Qi Dai, Chong Luo, Xin Geng, Baining Guo
In this paper, we target the realm of visual aesthetics and aim to align vision models with human aesthetic standards in a retrieval system.
no code implementations • 10 Jun 2024 • Zhen Xing, Qi Dai, Zejia Weng, Zuxuan Wu, Yu-Gang Jiang
Text-guided video prediction (TVP) involves predicting the motion of future frames from the initial frame according to an instruction, which has wide applications in virtual reality, robotics, and content creation.
no code implementations • 15 Mar 2024 • Qijun Feng, Zhen Xing, Zuxuan Wu, Yu-Gang Jiang
Reconstructing detailed 3D objects from single-view images remains a challenging task due to the limited information available.
no code implementations • 30 Nov 2023 • Zhen Xing, Qi Dai, Zihao Zhang, HUI ZHANG, Han Hu, Zuxuan Wu, Yu-Gang Jiang
Our model can edit and translate the desired results within seconds based on user instructions.
no code implementations • 24 Nov 2023 • HUI ZHANG, Zuxuan Wu, Zhen Xing, Jie Shao, Yu-Gang Jiang
Diffusion models, as a type of generative models, have achieved impressive results in generating images and videos conditioned on textual conditions.
1 code implementation • 16 Oct 2023 • Zhen Xing, Qijun Feng, Haoran Chen, Qi Dai, Han Hu, Hang Xu, Zuxuan Wu, Yu-Gang Jiang
However, existing surveys mainly focus on diffusion models in the context of image generation, with few up-to-date reviews on their application in the video domain.
1 code implementation • 15 Sep 2023 • Ruian He, Zhen Xing, Weimin Tan, Bo Yan
Second, we propose a novel representation diffusion model to disentangle 3D latent into facial identity and expression.
no code implementations • CVPR 2023 • Zhixin Ling, Zhen Xing, Xiangdong Zhou, Manliang Cao, Guichun Zhou
In panorama understanding, the widely used equirectangular projection (ERP) entails boundary discontinuity and spatial distortion.
no code implementations • CVPR 2024 • Zhen Xing, Qi Dai, Han Hu, Zuxuan Wu, Yu-Gang Jiang
In this work, we propose a Simple Diffusion Adapter (SimDA) that fine-tunes only 24M out of 1. 1B parameters of a strong T2I model, adapting it to video generation in a parameter-efficient way.
no code implementations • 2 Aug 2023 • Zejun Wu, Jiechao Wang, Zunquan Chen, Qinqin Yang, Zhen Xing, Dairong Cao, Jianfeng Bao, Taishan Kang, Jianzhong Lin, Shuhui Cai, Zhong Chen, Congbo Cai
Significance: FlexDTI can well learn diffusion gradient direction information to achieve generalized DTI reconstruction with flexible diffusion gradient scheme.
no code implementations • 26 May 2023 • Bei Li, Yi Jing, Xu Tan, Zhen Xing, Tong Xiao, Jingbo Zhu
Learning multiscale Transformer models has been evidenced as a viable approach to augmenting machine translation systems.
1 code implementation • CVPR 2023 • Zhen Xing, Qi Dai, Han Hu, Jingjing Chen, Zuxuan Wu, Yu-Gang Jiang
In this paper, we investigate the use of transformer models under the SSL setting for action recognition.
1 code implementation • 30 Sep 2022 • Zhen Xing, Hengduo Li, Zuxuan Wu, Yu-Gang Jiang
In particular, we introduce an attention-guided prototype shape prior module for guiding realistic object reconstruction.
no code implementations • 30 Jul 2022 • Zhen Xing, Yijiang Chen, Zhixin Ling, Xiangdong Zhou, Yu Xiang
In this paper, we present a Memory Prior Contrastive Network (MPCN) that can store shape prior knowledge in a few-shot learning based 3D reconstruction framework.
no code implementations • 2 Jun 2022 • Zhidan Liu, Zhen Xing, Xiangdong Zhou, Yijiang Chen, Guichun Zhou
We enhance the performance of image-based methods for category-agnostic object pose estimation by exploiting 3D knowledge learned by a multi-modal method.
no code implementations • 8 Mar 2022 • Yijiang Chen, Xiangdong Zhou, Zhen Xing, Zhidan Liu, Minyang Xu
Many previous works focus on the pretext task of self-supervised learning and usually neglect the complex problem of MTS encoding, leading to unpromising results.
1 code implementation • 8 Jul 2021 • Ruian He, Zhen Xing, Weimin Tan, Bo Yan
Affective Analysis is not a single task, and the valence-arousal value, expression class, and action unit can be predicted at the same time.