no code implementations • 7 May 2025 • Yisen Feng, Haoyu Zhang, Meng Liu, Weili Guan, Liqiang Nie
To address these limitations, we propose OSGNet, an Object-Shot enhanced Grounding Network for egocentric video.
no code implementations • 4 May 2025 • Yuchen Wang, Xuefeng Bai, Xiucheng Li, Weili Guan, Liqiang Nie, Xinyang Chen
Adapting vision-language models (VLMs) to downstream tasks with pseudolabels has gained increasing attention.
no code implementations • 19 Apr 2025 • Yaning Zhang, Jiahe Zhang, Chunjie Ma, Weili Guan, Tian Gan, Zan Gao
Besides, they tend to fail to assess the generalization performance of deepfake attributors to unseen generators in a fine-grained manner.
1 code implementation • 24 Mar 2025 • Yanda Chen, Gongwei Chen, Miao Zhang, Weili Guan, Liqiang Nie
Recent works on dataset distillation demonstrate that combining distilled and real data can mitigate the effectiveness decay.
no code implementations • 11 Mar 2025 • Runling Long, Yunlong Wang, Jia Wan, Xiang Deng, Xinting Zhu, Weili Guan, Antoni B. Chan, Liqiang Nie
However, most existing methods are designed for indoor navigation, showing unknown performance in analyzing complex object distribution in large scale scenes, such as crowds.
1 code implementation • 11 Mar 2025 • Xinrui Li, Jianlong Wu, Xinchuan Huang, Chong Chen, Weili Guan, Xian-Sheng Hua, Liqiang Nie
Pioneering text-to-image (T2I) diffusion models have ushered in a new era of real-world image super-resolution (Real-ISR), significantly enhancing the visual perception of reconstructed images.
1 code implementation • 18 Feb 2025 • Jiaqi Zhao, Miao Zhang, Ming Wang, Yuzhang Shang, Kaihao Zhang, Weili Guan, YaoWei Wang, Min Zhang
To explore the real limit of PTQ, we propose an extremely low-bit PTQ method called PTQ1. 61, which enables weight quantization to 1. 61-bit for the first time.
no code implementations • 27 Jan 2025 • Renshan Zhang, Rui Shao, Gongwei Chen, Kaiwen Zhou, Weili Guan, Liqiang Nie
To directly address the visual redundancy present in the output of vision encoder, we propose a Register-based Representation Compacting (ReCompact) mechanism.
no code implementations • 17 Dec 2024 • Yudong Han, Haocong Wang, Yupeng Hu, Yongshun Gong, Xuemeng Song, Weili Guan
Due to the superior ability of global dependency, transformer and its variants have become the primary choice in Masked Time-series Modeling (MTM) towards time-series classification task.
no code implementations • 1 Nov 2024 • Shengxun Wei, Zan Gao, Chunjie Ma, Yibo Zhao, Weili Guan, ShengYong Chen
Cloth-changing person re-identification is a subject closer to the real world, which focuses on solving the problem of person re-identification after pedestrians change clothes.
1 code implementation • 14 Oct 2024 • Kejie Wang, Xuemeng Song, Meng Liu, Jin Yuan, Weili Guan
Despite their advances, existing methods still encounter three key issues: 1) limited capacity of the text prompt in guiding target image generation, 2) insufficient mining of word-to-patch and patch-to-patch relationships for grounding editing areas, and 3) unified editing strength for all regions during each denoising step.
Ranked #3 on
Text-based Image Editing
on PIE-Bench
1 code implementation • 19 Jul 2024 • Renshan Zhang, Yibo Lyu, Rui Shao, Gongwei Chen, Weili Guan, Liqiang Nie
Secondly, we present a token-level sampling method that efficiently captures the most informative tokens by delving into the correlation between the [CLS] token and patch tokens.
1 code implementation • 17 Jul 2024 • Leyang Shen, Gongwei Chen, Rui Shao, Weili Guan, Liqiang Nie
In this paper, we propose a mixture of multimodal experts (MoME) to mitigate task interference and obtain a generalist MLLM.
no code implementations • 25 Apr 2024 • Han Liu, Yinwei Wei, Xuemeng Song, Weili Guan, Yuan-Fang Li, Liqiang Nie
Multimodal recommendation aims to recommend user-preferred candidates based on her/his historically interacted items and associated multimodal information.
1 code implementation • 4 Apr 2024 • Tiantian Geng, Teng Wang, yanfu Zhang, Jinming Duan, Weili Guan, Feng Zheng, Ling Shao
Video localization tasks aim to temporally locate specific instances in videos, including temporal action localization (TAL), sound event detection (SED) and audio-visual event localization (AVEL).
1 code implementation • 9 Jan 2024 • Xue Dong, Xuemeng Song, Tongliang Liu, Weili Guan
Multi-interest learning method for sequential recommendation aims to predict the next item according to user multi-faceted interests given the user historical interactions.
2 code implementations • 11 Oct 2023 • Haoyu Zhang, Meng Liu, YaoWei Wang, Da Cao, Weili Guan, Liqiang Nie
In response to these challenges, we present an iterative search and reasoning framework, which consists of a textual encoder, a visual encoder, and a generator.
no code implementations • ICCV 2023 • Baoshuo Kan, Teng Wang, Wenpeng Lu, XianTong Zhen, Weili Guan, Feng Zheng
Pre-trained vision-language models, e. g., CLIP, working with manually designed prompts have demonstrated great capacity of transfer learning.
1 code implementation • 2 Aug 2023 • Guojin Zhong, Jin Yuan, Pan Wang, Kailun Yang, Weili Guan, Zhiyong Li
The recently rising markup-to-image generation poses greater challenges as compared to natural image generation, due to its low tolerance for errors as well as the complex sequence and context correlations between markup and rendered image.
1 code implementation • ICCV 2023 • Dong Lu, Zhiqiang Wang, Teng Wang, Weili Guan, Hongchang Gao, Feng Zheng
Vision-language pre-training (VLP) models have shown vulnerability to adversarial examples in multimodal tasks.
no code implementations • 10 Apr 2023 • Zan Gao, Shenxun Wei, Weili Guan, Lei Zhu, Meng Wang, Shenyong Chen
Moreover, human semantic information and pedestrian identity information are not fully explored.
no code implementations • 18 Jul 2022 • Zan Gao, Hongwei Wei, Weili Guan, Jie Nie, Meng Wang, Shenyong Chen
In addition, a visual clothes shielding module (VCS) is also designed to extract a more robust feature representation for the cloth-changing task by covering the clothing regions and focusing the model on the visual semantic information unrelated to the clothes.
Cloth-Changing Person Re-Identification
Semantic Segmentation
1 code implementation • 10 Jan 2022 • Ansong Li, Zhiyong Cheng, Fan Liu, Zan Gao, Weili Guan, Yuxin Peng
The session embedding is then generated by aggregating the item embeddings with attention weights of each item's factors.
no code implementations • 25 Sep 2021 • Zan Gao, Yuxiang Shao, Weili Guan, Meng Liu, Zhiyong Cheng, ShengYong Chen
Thus, we tackle this problem from the perspective of exploiting the relationships between patch features to capture long-range associations among multi-view images.
no code implementations • 10 Aug 2021 • Zan Gao, Hongwei Wei, Weili Guan, Weizhi Nie, Meng Liu, Meng Wang
To solve these issues, in this work, a novel multigranular visual-semantic embedding algorithm (MVSE) is proposed for cloth-changing person ReID, where visual semantic information and human attributes are embedded into the network, and the generalized features of human appearance can be well learned to effectively solve the problem of clothing changes.
no code implementations • 10 Aug 2021 • Zan Gao, Chao Sun, Zhiyong Cheng, Weili Guan, AnAn Liu, Meng Wang
In this work, a novel end-to-end two-stream boundary-aware network (abbreviated as TBNet) is proposed for generic image manipulation localization in which the RGB stream, the frequency stream, and the boundary artifact location are explored in a unified framework.
no code implementations • IJCNLP 2019 • Linmei Hu, Luhao Zhang, Chuan Shi, Liqiang Nie, Weili Guan, Cheng Yang
Distantly-supervised relation extraction has proven to be effective to find relational facts from texts.