Search Results for author: Xun Guo

Found 13 papers, 6 papers with code

I2V-Adapter: A General Image-to-Video Adapter for Diffusion Models

no code implementations • 27 Dec 2023 • Xun Guo, Mingwu Zheng, Liang Hou, Yuan Gao, Yufan Deng, Pengfei Wan, Di Zhang, Yufan Liu, Weiming Hu, ZhengJun Zha, Haibin Huang, Chongyang Ma

I2V-Adapter adeptly propagates the unnoised input image to subsequent noised frames through a cross-frame attention mechanism, maintaining the identity of the input image without any changes to the pretrained T2V model.

Video Generation

Paper
Add Code

Hulk: A Universal Knowledge Translator for Human-Centric Tasks

2 code implementations • 4 Dec 2023 • Yizhou Wang, Yixuan Wu, Shixiang Tang, Weizhen He, Xun Guo, Feng Zhu, Lei Bai, Rui Zhao, Jian Wu, Tong He, Wanli Ouyang

Human-centric perception tasks, e. g., pedestrian detection, skeleton-based action recognition, and pose estimation, have wide industrial applications, such as metaverse and sports analysis.

Ranked #1 on Pedestrian Image Caption on CUHK-PEDES

3D Human Pose Estimation Action Recognition +8

209

Paper
Code

StableVideo: Text-driven Consistency-aware Diffusion Video Editing

1 code implementation • ICCV 2023 • Wenhao Chai, Xun Guo, Gaoang Wang, Yan Lu

In this paper, we tackle this problem by introducing temporal dependency to existing text-driven diffusion models, which allows them to generate consistent appearance for the edited objects.

Video Editing

1,337

Paper
Code

MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

1 code implementation • 31 Jul 2023 • Enxin Song, Wenhao Chai, Guanhong Wang, Yucheng Zhang, Haoyang Zhou, Feiyang Wu, Haozhe Chi, Xun Guo, Tian Ye, Yanting Zhang, Yan Lu, Jenq-Neng Hwang, Gaoang Wang

Recently, integrating video foundation models and large language models to build a video understanding system can overcome the limitations of specific pre-defined vision tasks.

Ranked #1 on zero-shot long video global-mode question answering on MovieChat-1K

Video-based Generative Performance Benchmarking (Consistency) Video-based Generative Performance Benchmarking (Contextual Understanding) +10

412

Paper
Code

Alignment-guided Temporal Attention for Video Action Recognition

no code implementations • 30 Sep 2022 • Yizhou Zhao, Zhenyang Li, Xun Guo, Yan Lu

Temporal modeling is crucial for various video learning tasks.

Action Recognition Attribute +1

Paper
Add Code

Semantic-aligned Fusion Transformer for One-shot Object Detection

no code implementations • CVPR 2022 • Yizhou Zhao, Xun Guo, Yan Lu

One-shot object detection aims at detecting novel objects according to merely one given instance.

Attribute Object +2

Paper
Add Code

Rethinking Minimal Sufficient Representation in Contrastive Learning

1 code implementation • CVPR 2022 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

It significantly improves the performance of several classic contrastive learning models in downstream tasks.

Contrastive Learning Representation Learning

Paper
Code

Cross-Stage Transformer for Video Learning

no code implementations • 29 Sep 2021 • Yuanze Lin, Xun Guo, Yan Lu

By inserting the proposed cross-stage mechanism in existing spatial and temporal transformer blocks, we build a separable transformer network for video learning based on ViT structure, in which self-attentions and features are progressively aggregated from one block to the next.

Action Recognition Temporal Action Localization

Paper
Add Code

What Makes for Good Representations for Contrastive Learning

no code implementations • 29 Sep 2021 • Haoqing Wang, Xun Guo, Zhi-Hong Deng, Yan Lu

Therefore, we assume the task-relevant information that is not shared between views can not be ignored and theoretically prove that the minimal sufficient representation in contrastive learning is not sufficient for the downstream tasks, which causes performance degradation.

Contrastive Learning Representation Learning

Paper
Add Code

Self-Supervised Video Representation Learning with Meta-Contrastive Network

no code implementations • ICCV 2021 • Yuanze Lin, Xun Guo, Yan Lu

Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch.

Ranked #28 on Self-Supervised Action Recognition on HMDB51

Contrastive Learning Meta-Learning +6

Paper
Add Code

SSAN: Separable Self-Attention Network for Video Representation Learning

no code implementations • CVPR 2021 • Xudong Guo, Xun Guo, Yan Lu

However, spatial correlations and temporal correlations represent different contextual information of scenes and temporal reasoning.

Action Recognition Representation Learning +3

Paper
Add Code

In Defense of the Classification Loss for Person Re-Identification

1 code implementation • 16 Sep 2018 • Yao Zhai, Xun Guo, Yan Lu, Houqiang Li

The recent research for person re-identification has been focused on two trends.

Classification General Classification +2

Paper
Code

An End-to-End Compression Framework Based on Convolutional Neural Networks

5 code implementations • 2 Aug 2017 • Feng Jiang, Wen Tao, Shaohui Liu, Jie Ren, Xun Guo, Debin Zhao

The second CNN, named reconstruction convolutional neural network (RecCNN), is used to reconstruct the decoded image with high-quality in the decoding end.

Denoising Image Compression

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.