1 code implementation • 29 May 2025 • Yufan Deng, Xun Guo, Yuanyang Yin, Jacob Zhiyuan Fang, Yiding Yang, Yizhi Wang, Shenghai Yuan, Angtian Wang, Bo Liu, Haibin Huang, Chongyang Ma
Video generation has made substantial strides with the emergence of deep generative models, especially diffusion-based approaches.
no code implementations • 13 Mar 2025 • Yufan Deng, Xun Guo, Yizhi Wang, Jacob Zhiyuan Fang, Angtian Wang, Shenghai Yuan, Yiding Yang, Bo Liu, Haibin Huang, Chongyang Ma
By leveraging MLLM to interpret subject relationships, our method facilitates scalability, enabling the use of large and diverse datasets for training.
no code implementations • CVPR 2025 • Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin
This work explores whether a deep generative model can learn complex knowledge solely from visual input, in contrast to the prevalent focus on text-based models like large language models (LLMs).
no code implementations • CVPR 2025 • Dongnan Gui, Xun Guo, Wengang Zhou, Yan Lu
This paper proposes a novel approach that applies imperceptible perturbations on images to degrade the quality of the generated videos, thereby protecting images from misuse in white-box image-to-video diffusion models.
1 code implementation • 28 Oct 2024 • Xun Guo, Shan Zhang, Yongxin He, Ting Zhang, Wanquan Feng, Haibin Huang, Chongyang Ma
Our method is compatible with a range of text encoders.
no code implementations • 27 Oct 2024 • Zongyi Li, Shujie Hu, Shujie Liu, Long Zhou, Jeongsoo Choi, Lingwei Meng, Xun Guo, Jinyu Li, Hefei Ling, Furu Wei
Text-to-video models have recently undergone rapid and substantial advancements.
2 code implementations • 27 Dec 2023 • Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma
I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.
2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang
Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.
Ranked #1 on
Pedestrian Image Caption
on CUHK-PEDES
1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu
In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.
1 code implementation • CVPR 2024 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang
Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.
Multiple-choice
Video-based Generative Performance Benchmarking (Consistency)
+11
no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu
Temporal modeling is crucial for various video learning tasks.
no code implementations • CVPR 2022 • Yizhou Zhao, Xun Guo, Yan Lu
One-shot object detection aims at detecting novel objects according to merely one given instance.
1 code implementation • CVPR 2022 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu
It significantly improves the performance of several classic contrastive learning models in downstream tasks.
no code implementations • 29 Sep 2021 • Yuanze Lin, Xun Guo, Yan Lu
By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.
no code implementations • 29 Sep 2021 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu
Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.
no code implementations • ICCV 2021 • Yuanze Lin, Xun Guo, Yan Lu
Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.
Ranked #28 on
Self-Supervised Action Recognition
on HMDB51
no code implementations • CVPR 2021 • Xudong Guo, Xun Guo, Yan Lu
However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.
1 code implementation • 16 Sep 2018 • Yao Zhai, Xun Guo, Yan Lu, Houqiang Li
The recent research for person re-identification has been focused on two trends.
5 code implementations • 2 Aug 2017 • Feng Jiang, Wen Tao, Shaohui Liu, Jie Ren, Xun Guo, Debin Zhao
The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end.