Search Results for author: Xun Guo

Found 19 papers, 9 papers with code

MAGREF: Masked Guidance for Any-Reference Video Generation

1 code implementation29 May 2025 Yufan Deng, Xun Guo, Yuanyang Yin, Jacob Zhiyuan Fang, Yiding Yang, Yizhi Wang, Shenghai Yuan, Angtian Wang, Bo Liu, Haibin Huang, Chongyang Ma

Video generation has made substantial strides with the emergence of deep generative models, especially diffusion-based approaches.

Video Generation

CINEMA: Coherent Multi-Subject Video Generation via MLLM-Based Guidance

no code implementations13 Mar 2025 Yufan Deng, Xun Guo, Yizhi Wang, Jacob Zhiyuan Fang, Angtian Wang, Shenghai Yuan, Yiding Yang, Bo Liu, Haibin Huang, Chongyang Ma

By leveraging MLLM to interpret subject relationships, our method facilitates scalability, enabling the use of large and diverse datasets for training.

Large Language Model Multimodal Large Language Model +1

VideoWorld: Exploring Knowledge Learning from Unlabeled Videos

no code implementations CVPR 2025 Zhongwei Ren, Yunchao Wei, Xun Guo, Yao Zhao, Bingyi Kang, Jiashi Feng, Xiaojie Jin

This work explores whether a deep generative model can learn complex knowledge solely from visual input, in contrast to the prevalent focus on text-based models like large language models (LLMs).

Video Generation

I2VGuard: Safeguarding Images against Misuse in Diffusion-based Image-to-Video Models

no code implementations CVPR 2025 Dongnan Gui, Xun Guo, Wengang Zhou, Yan Lu

This paper proposes a novel approach that applies imperceptible perturbations on images to degrade the quality of the generated videos, thereby protecting images from misuse in white-box image-to-video diffusion models.

Adversarial Attack Image to Video Generation +1

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

2 code implementations27 Dec 2023 Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma

I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.

Video Generation

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations4 Dec 2023 Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

3D Human Pose Estimation Action Recognition +8

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation ICCV 2023 Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

Semantic-aligned Fusion Transformer for One-shot Object Detection

no code implementations CVPR 2022 Yizhou Zhao, Xun Guo, Yan Lu

One-shot object detection aims at detecting novel objects according to merely one given instance.

Attribute Object +2

Cross-Stage Transformer for Video Learning

no code implementations29 Sep 2021 Yuanze Lin, Xun Guo, Yan Lu

By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.

Action Recognition Temporal Action Localization

What Makes for Good Representations for Contrastive Learning

no code implementations29 Sep 2021 Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.

Contrastive Learning Diversity +1

Self-Supervised Video Representation Learning with Meta-Contrastive Network

no code implementations ICCV 2021 Yuanze Lin, Xun Guo, Yan Lu

Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.

Contrastive Learning Meta-Learning +6

SSAN: Separable Self-Attention Network for Video Representation Learning

no code implementations CVPR 2021 Xudong Guo, Xun Guo, Yan Lu

However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.

Action Recognition Representation Learning +3

An End-to-End Compression Framework Based on Convolutional Neural Networks

5 code implementations2 Aug 2017 Feng Jiang, Wen Tao, Shaohui Liu, Jie Ren, Xun Guo, Debin Zhao

The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end.

Denoising Image Compression

Cannot find the paper you are looking for? You can Submit a new open access paper.