Search Results for author: Weili Guan

Found 27 papers, 12 papers with code

Object-Shot Enhanced Grounding Network for Egocentric Video

no code implementations7 May 2025 Yisen Feng, Haoyu Zhang, Meng Liu, Weili Guan, Liqiang Nie

To address these limitations, we propose OSGNet, an Object-Shot enhanced Grounding Network for egocentric video.

Video Grounding

BMRL: Bi-Modal Guided Multi-Perspective Representation Learning for Zero-Shot Deepfake Attribution

no code implementations19 Apr 2025 Yaning Zhang, Jiahe Zhang, Chunjie Ma, Weili Guan, Tian Gan, Zan Gao

Besides, they tend to fail to assess the generalization performance of deepfake attributors to unseen generators in a fine-grained manner.

Attribute Face Parsing +2

Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation

1 code implementation24 Mar 2025 Yanda Chen, Gongwei Chen, Miao Zhang, Weili Guan, Liqiang Nie

Recent works on dataset distillation demonstrate that combining distilled and real data can mitigate the effectiveness decay.

Dataset Distillation

Embodied Crowd Counting

no code implementations11 Mar 2025 Runling Long, Yunlong Wang, Jia Wan, Xiang Deng, Xinting Zhu, Weili Guan, Antoni B. Chan, Liqiang Nie

However, most existing methods are designed for indoor navigation, showing unknown performance in analyzing complex object distribution in large scale scenes, such as crowds.

Crowd Counting Object

MegaSR: Mining Customized Semantics and Expressive Guidance for Image Super-Resolution

1 code implementation11 Mar 2025 Xinrui Li, Jianlong Wu, Xinchuan Huang, Chong Chen, Weili Guan, Xian-Sheng Hua, Liqiang Nie

Pioneering text-to-image (T2I) diffusion models have ushered in a new era of real-world image super-resolution (Real-ISR), significantly enhancing the visual perception of reconstructed images.

Image Super-Resolution

PTQ1.61: Push the Real Limit of Extremely Low-Bit Post-Training Quantization Methods for Large Language Models

1 code implementation18 Feb 2025 Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang

To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.

Binarization Quantization

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

no code implementations27 Jan 2025 Renshan Zhang, Rui Shao, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie

To directly address the visual redundancy present in the output of vision encoder, we propose a Register-based Representation Compacting (ReCompact) mechanism.

Content-aware Balanced Spectrum Encoding in Masked Modeling for Time Series Classification

no code implementations17 Dec 2024 Yudong Han, Haocong Wang, Yupeng Hu, Yongshun Gong, Xuemeng Song, Weili Guan

Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task.

Decoder Time Series +1

Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification

no code implementations1 Nov 2024 Shengxun Wei, Zan Gao, Chunjie Ma, Yibo Zhao, Weili Guan, ShengYong Chen

Cloth-changing person re-identification is a subject closer to the real world, which focuses on solving the problem of person re-identification after pedestrians change clothes.

Cloth-Changing Person Re-Identification Prompt Learning

Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing

1 code implementation14 Oct 2024 Kejie Wang, Xuemeng Song, Meng Liu, Jin Yuan, Weili Guan

Despite their advances, existing methods still encounter three key issues: 1) limited capacity of the text prompt in guiding target image generation, 2) insufficient mining of word-to-patch and patch-to-patch relationships for grounding editing areas, and 3) unified editing strength for all regions during each denoising step.

Denoising Image Generation +1

Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

1 code implementation19 Jul 2024 Renshan Zhang, Yibo Lyu, Rui Shao, Gongwei Chen, Weili Guan, Liqiang Nie

Secondly, we present a token-level sampling method that efficiently captures the most informative tokens by delving into the correlation between the [CLS] token and patch tokens.

document understanding Informativeness

MoME: Mixture of Multimodal Experts for Generalist Multimodal Large Language Models

1 code implementation17 Jul 2024 Leyang Shen, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie

In this paper, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM.

MMGRec: Multimodal Generative Recommendation with Transformer Model

no code implementations25 Apr 2024 Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie

Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information.

model Multimodal Recommendation +2

UniAV: Unified Audio-Visual Perception for Multi-Task Video Event Localization

1 code implementation4 Apr 2024 Tiantian Geng, Teng Wang, yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng, Ling Shao

Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).

audio-visual event localization Event Detection +2

Prompt-based Multi-interest Learning Method for Sequential Recommendation

1 code implementation9 Jan 2024 Xue Dong, Xuemeng Song, Tongliang Liu, Weili Guan

Multi-interest learning method for sequential recommendation aims to predict the next item according to user multi-faceted interests given the user historical interactions.

Sequential Recommendation

Uncovering Hidden Connections: Iterative Search and Reasoning for Video-grounded Dialog

2 code implementations11 Oct 2023 Haoyu Zhang, Meng Liu, YaoWei Wang, Da Cao, Weili Guan, Liqiang Nie

In response to these challenges, we present an iterative search and reasoning framework, which consists of a textual encoder, a visual encoder, and a generator.

Question Answering Response Generation +1

Knowledge-Aware Prompt Tuning for Generalizable Vision-Language Models

no code implementations ICCV 2023 Baoshuo Kan, Teng Wang, Wenpeng Lu, XianTong Zhen, Weili Guan, Feng Zheng

Pre-trained vision-language models, e. g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning.

Few-Shot Image Classification Transfer Learning

Contrast-augmented Diffusion Model with Fine-grained Sequence Alignment for Markup-to-Image Generation

1 code implementation2 Aug 2023 Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li

The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image.

cross-modal alignment Denoising +1

Identity-Guided Collaborative Learning for Cloth-Changing Person Reidentification

no code implementations10 Apr 2023 Zan Gao, Shenxun Wei, Weili Guan, Lei Zhu, Meng Wang, Shenyong Chen

Moreover, human semantic information and pedestrian identity information are not fully explored.

A Semantic-aware Attention and Visual Shielding Network for Cloth-changing Person Re-identification

no code implementations18 Jul 2022 Zan Gao, Hongwei Wei, Weili Guan, Jie Nie, Meng Wang, Shenyong Chen

In addition, a visual clothes shielding module (VCS) is also designed to extract a more robust feature representation for the cloth-changing task by covering the clothing regions and focusing the model on the visual semantic information unrelated to the clothes.

Cloth-Changing Person Re-Identification Semantic Segmentation

Disentangled Graph Neural Networks for Session-based Recommendation

1 code implementation10 Jan 2022 Ansong Li, Zhiyong Cheng, Fan Liu, Zan Gao, Weili Guan, Yuxin Peng

The session embedding is then generated by aggregating the item embeddings with attention weights of each item's factors.

Graph Neural Network Session-Based Recommendations

A Novel Patch Convolutional Neural Network for View-based 3D Model Retrieval

no code implementations25 Sep 2021 Zan Gao, Yuxiang Shao, Weili Guan, Meng Liu, Zhiyong Cheng, ShengYong Chen

Thus, we tackle this problem from the perspective of exploiting the relationships between patch features to capture long-range associations among multi-view images.

Retrieval

Multigranular Visual-Semantic Embedding for Cloth-Changing Person Re-identification

no code implementations10 Aug 2021 Zan Gao, Hongwei Wei, Weili Guan, Weizhi Nie, Meng Liu, Meng Wang

To solve these issues, in this work, a novel multigranular visual-semantic embedding algorithm (MVSE) is proposed for cloth-changing person ReID, where visual semantic information and human attributes are embedded into the network, and the generalized features of human appearance can be well learned to effectively solve the problem of clothing changes.

Cloth-Changing Person Re-Identification

TBNet:Two-Stream Boundary-aware Network for Generic Image Manipulation Localization

no code implementations10 Aug 2021 Zan Gao, Chao Sun, Zhiyong Cheng, Weili Guan, AnAn Liu, Meng Wang

In this work, a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) is proposed for generic image manipulation localization in which the RGB stream, the frequency stream, and the boundary artifact location are explored in a unified framework.

Image Manipulation Image Manipulation Localization

Cannot find the paper you are looking for? You can Submit a new open access paper.