Search Results for author: Mushui Liu

Found 14 papers, 5 papers with code

DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation

no code implementations21 Apr 2025 Weijie He, Mushui Liu, Yunlong Yu, Zhao Wang, Chao Wu

Compositional text-to-video generation, which requires synthesizing dynamic scenes with multiple interacting entities and precise spatial-temporal relationships, remains a critical challenge for diffusion-based models.

Attribute Denoising +3

CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation

no code implementations7 Mar 2025 Guanghao Zhang, Tao Zhong, Yan Xia, Zhelun Yu, Haoyuan Li, Wanggui He, Fangxun Shu, Mushui Liu, Dong She, Yi Wang, Hao Jiang

The construction of interleaved multimodal multi-step reasoning chains, which utilize critical visual region tokens, extracted from intermediate reasoning steps, as supervisory signals.

Image Comprehension Memorization

RectifiedHR: Enable Efficient High-Resolution Image Generation via Energy Rectification

no code implementations4 Mar 2025 Zhen Yang, Guibao Shen, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Pengfei Wan, Di Zhang, Ying-Cong Chen

In this paper, we propose RectifiedHR, an straightforward and efficient solution for training-free high-resolution image generation.

Image Generation

CustomVideoX: 3D Reference Attention Driven Dynamic Adaptation for Zero-Shot Customized Video Diffusion Transformers

no code implementations10 Feb 2025 D. She, Mushui Liu, Jingxuan Pang, Jin Wang, Zhen Yang, Wanggui He, Guanghao Zhang, Yi Wang, Qihan Huang, Haobin Tang, Yunlong Yu, Siming Fu

Customized generation has achieved significant progress in image synthesis, yet personalized video generation remains challenging due to temporal inconsistencies and quality degradation.

Image Generation Video Generation

RestorerID: Towards Tuning-Free Face Restoration with ID Preservation

1 code implementation21 Nov 2024 Jiacheng Ying, Mushui Liu, Zhe Wu, Runming Zhang, Zhu Yu, Siming Fu, Si-Yuan Cao, Chao Wu, Yunlong Yu, Hui-Liang Shen

RestorerID is a diffusion model-based method that restores low-quality images with varying levels of degradation by using a single reference image.

Blind Face Restoration Face Alignment

Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision

no code implementations6 Sep 2024 Weijie He, Mushui Liu, Yunlong Yu, Zheming Lu, Xi Li

Single-frame infrared small target (SIRST) detection poses a significant challenge due to the requirement to discern minute targets amidst complex infrared background clutter.

Frame Order Matters: A Temporal Sequence-Aware Model for Few-Shot Action Recognition

no code implementations22 Aug 2024 Bozheng Li, Mushui Liu, Gaoang Wang, Yunlong Yu

In this paper, we propose a novel Temporal Sequence-Aware Model (TSAM) for few-shot action recognition (FSAR), which incorporates a sequential perceiver adapter into the pre-training framework, to integrate both the spatial information and the sequential temporal dynamics into the feature embeddings.

Decision Making Few-Shot action recognition +1

Envisioning Class Entity Reasoning by Large Language Models for Few-shot Learning

no code implementations22 Aug 2024 Mushui Liu, Fangtai Wu, Bozheng Li, Ziqian Lu, Yunlong Yu, Xi Li

Few-shot learning (FSL) aims to recognize new concepts using a limited number of visual samples.

Few-Shot Learning

OmniCLIP: Adapting CLIP for Video Recognition with Spatial-Temporal Omni-Scale Feature Learning

1 code implementation12 Aug 2024 Mushui Liu, Bozheng Li, Yunlong Yu

In this paper, we propose OmniCLIP, a framework that adapts CLIP for video recognition by focusing on learning comprehensive features encompassing spatial, temporal, and dynamic spatial-temporal scales, which we refer to as omni-scale features.

Video Recognition Zero-Shot Learning

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

1 code implementation10 Jul 2024 Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, Leilei Gan, Hao Jiang

Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis.

Image Generation Text Generation

Fully Fine-tuned CLIP Models are Efficient Few-Shot Learners

no code implementations4 Jul 2024 Mushui Liu, Bozheng Li, Yunlong Yu

Prompt tuning, which involves training a small set of parameters, effectively enhances the pre-trained Vision-Language Models (VLMs) to downstream tasks.

Domain Generalization Few-Shot Learning +1

CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation

1 code implementation17 May 2024 Mushui Liu, Jun Dan, Ziqian Lu, Yunlong Yu, Yingming Li, Xi Li

In this paper, we propose CM-UNet, comprising a CNN-based encoder for extracting local image features and a Mamba-based decoder for aggregating and integrating global information, facilitating efficient semantic segmentation of remote sensing images.

Decoder Mamba +2

SYNC-CLIP: Synthetic Data Make CLIP Generalize Better in Data-Limited Scenarios

1 code implementation6 Dec 2023 Mushui Liu, Weijie He, Ziqian Lu, Yunlong Yu

Prompt learning is a powerful technique for transferring Vision-Language Models (VLMs) such as CLIP to downstream tasks.

Prompt Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.