no code implementations • 17 Mar 2025 • Ziqiang Li, Jun Li, Lizhi Xiong, Zhangjie Fu, Zechao Li
Text-to-image diffusion models have made significant advancements in generating high-quality, diverse images from text prompts.
no code implementations • 9 Mar 2025 • Yu Liu, Hao Tang, Haiqi Zhang, Jing Qin, Zechao Li
Out-of-distribution (OOD) detection is crucial for ensuring the reliability and safety of machine learning models in real-world applications.
Out-of-Distribution Detection
Out of Distribution (OOD) Detection
1 code implementation • 9 Feb 2025 • Enquan Yang, Peng Xing, Hanyang Sun, Wenbo Guo, Yuanwei Ma, Zechao Li, Dan Zeng
The key features of 3CAD are that it covers anomalous regions of different sizes, multiple anomaly types, and the possibility of multiple anomalous regions and multiple anomaly types per anomaly image.
1 code implementation • 23 Dec 2024 • Jiaqi Ma, Guo-Sen Xie, Fang Zhao, Zechao Li
Therefore, in this paper, we utilize the more challenging image-level annotations and propose an adaptive frequency-aware network (AFANet) for weakly-supervised few-shot semantic segmentation (WFSS).
1 code implementation • 18 Dec 2024 • Yanpeng Sun, Jing Hao, Ke Zhu, Jiang-Jiang Liu, Yuxiang Zhao, Xiaofan Li, Gang Zhang, Zechao Li, Jingdong Wang
We propose to leverage off-the-shelf visual specialists, which were trained from annotated images initially not for image captioning, for enhancing the image caption.
no code implementations • 2 Dec 2024 • Hao Tang, Zechao Li, Dong Zhang, Shengfeng He, Jinhui Tang
Furthermore, a Modality-aware Dynamic Aggregation Module in the modality-complementary flow dynamically aggregates saliency-related cues from both modality-specific flows.
no code implementations • 27 Nov 2024 • Meiqi Cao, Xiangbo Shu, Jiachao Zhang, Rui Yan, Zechao Li, Jinhui Tang
In this article, we present a synergy-aware framework, i. e., EventCrab, that adeptly integrates the "lighter" frame-specific networks for dense event frames with the "heavier" point-specific networks for sparse event points, balancing accuracy and efficiency.
1 code implementation • 12 Nov 2024 • Deng Xu, Chao Zhang, Zechao Li, Chunlin Chen, Huaxiong Li
To alleviate the negative influence of feature redundancy, inspired by robust PCA, DSTL disentangles the latent low-dimensional representation into a semantic-unrelated part and a semantic-related part for each view.
no code implementations • 17 Oct 2024 • Yanpeng Sun, Huaxin Zhang, Qiang Chen, Xinyu Zhang, Nong Sang, Gang Zhang, Jingdong Wang, Zechao Li
QLadder employs a learnable ``\textit{ladder}'' structure to deeply aggregates the intermediate representations from the frozen pretrained visual encoder (e. g., CLIP image encoder).
Ranked #163 on
Visual Question Answering
on MM-Vet
no code implementations • 11 Oct 2024 • Xin Jiang, Xu Cheng, Zechao Li
Interestingly, we discover that only a small amount of pre-trained model's knowledge is preserved for the inference of downstream tasks.
no code implementations • 29 Aug 2024 • Peng Xing, Haofan Wang, Yanpeng Sun, Qixun Wang, Xu Bai, Hao Ai, Renyuan Huang, Zechao Li
Based on this pipeline, we construct a dataset IMAGStyle, the first large-scale style transfer dataset containing 210k image triplets, available for the community to explore and research.
1 code implementation • 17 Jul 2024 • Fei Shen, Xin Jiang, Xin He, Hu Ye, Cong Wang, Xiaoyu Du, Zechao Li, Jinhui Tang
Latest advances have achieved realistic virtual try-on (VTON) through localized garment inpainting using latent diffusion models, significantly enhancing consumers' online shopping experience.
no code implementations • 7 Jun 2024 • Peng Xing, Dong Zhang, Jinhui Tang, Zechao Li
Specifically, by Case-1, we found that the main reasons detrimental to current AD methods is that the inputs to the recovery model contain a large number of detailed features to be recovered, which leads to the normal/abnormal area has-not/has been recovered into its original state.
no code implementations • 5 Jun 2024 • Peng Xing, Ning Wang, Jianbo Ouyang, Zechao Li
The remarkable advancement in text-to-image generation models significantly boosts the research in ID customization generation.
no code implementations • 24 Apr 2024 • Xin Jiang, Hao Tang, Rui Yan, Jinhui Tang, Zechao Li
This paper presents a meticulous analysis leading to the proposal of practical guidelines to identify subcategory-specific discrepancies and generate discriminative features to design effective FGIR models.
1 code implementation • CVPR 2024 • Yanpeng Sun, Jiahui Chen, Shan Zhang, Xinyu Zhang, Qiang Chen, Gang Zhang, Errui Ding, Jingdong Wang, Zechao Li
In this paper, we propose a novel Visual Reference Prompt (VRP) encoder that empowers the Segment Anything Model (SAM) to utilize annotated reference images as prompts for segmentation, creating the VRP-SAM model.
1 code implementation • 20 Jan 2024 • Tao Chen, Yazhou Yao, Xingguo Huang, Zechao Li, Liqiang Nie, Jinhui Tang
In this paper, we propose spatial structure constraints (SSC) for weakly supervised semantic segmentation to alleviate the unwanted object over-activation of attention expansion.
1 code implementation • 19 Dec 2023 • Wei Tang, Liang Li, Xuejing Liu, Lu Jin, Jinhui Tang, Zechao Li
In this paper, we propose a novel framework with context disentangling and prototype inheriting for robust visual grounding to handle both scenes.
no code implementations • 13 Nov 2023 • Xuejing Liu, Wei Tang, Xinzhe Ni, Jinghui Lu, Rui Zhao, Zechao Li, Fei Tan
This pipeline achieved superior performance compared to the majority of existing Multimodal Large Language Models (MLLM) on four text-rich VQA datasets.
no code implementations • 10 Nov 2023 • Ziye Fang, Xin Jiang, Hao Tang, Zechao Li
In the field of intelligent multimedia analysis, ultra-fine-grained visual categorization (Ultra-FGVC) plays a vital role in distinguishing intricate subcategories within broader categories.
no code implementations • 16 Sep 2023 • Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li
In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model.
Ranked #10 on
Fine-Grained Image Classification
on CUB-200-2011
no code implementations • 29 Aug 2023 • Henghao Zhao, Kevin Qinghong Lin, Rui Yan, Zechao Li
An arbitrary setting can be used in DiffusionVMR during inference without consistency with the training phase.
Ranked #4 on
Video Grounding
on QVHighlights
no code implementations • 6 Aug 2023 • Hao Tang, Jun Liu, Shuanglin Yan, Rui Yan, Zechao Li, Jinhui Tang
Due to the scarcity of manually annotated data required for fine-grained video understanding, few-shot fine-grained (FS-FG) action recognition has gained significant attention, with the aim of classifying novel fine-grained action categories with only a few labeled instances.
no code implementations • 12 May 2023 • Jian Zhao, Jianan Li, Lei Jin, Jiaming Chu, Zhihao Zhang, Jun Wang, Jiangqiang Xia, Kai Wang, Yang Liu, Sadaf Gulshad, Jiaojiao Zhao, Tianyang Xu, XueFeng Zhu, Shihan Liu, Zheng Zhu, Guibo Zhu, Zechao Li, Zheng Wang, Baigui Sun, Yandong Guo, Shin ichi Satoh, Junliang Xing, Jane Shen Shengmei
Second, we set up two tracks for the first time, i. e., Anti-UAV Tracking and Anti-UAV Detection & Tracking.
1 code implementation • 10 Apr 2023 • Yanpeng Sun, Qiang Chen, Jian Wang, Jingdong Wang, Zechao Li
By doing this, the model can leverage the diverse knowledge stored in different parts of the model to improve its performance on new tasks.
no code implementations • 19 Oct 2022 • Peng Xing, Hao Tang, Jinhui Tang, Zechao Li
However, existing KDAD methods suffer from two main limitations: 1) the student network can effortlessly replicate the teacher network's representations, and 2) the features of the teacher network serve solely as a ``reference standard" and are not fully leveraged.
no code implementations • 26 Sep 2022 • Peng Xing, Yanpeng Sun, Zechao Li
In this paper, a novel Self-Supervised Guided Segmentation Framework (SGSF) is proposed by jointly exploring effective generation method of forged anomalous samples and the normal sample features as the guidance information of segmentation for anomaly detection.
no code implementations • 26 Sep 2022 • Peng Xing, Zechao Li
Reconstruction method based on the memory module for visual anomaly detection attempts to narrow the reconstruction error for normal samples while enlarging it for anomalous samples.
1 code implementation • 18 Jul 2022 • Xuejing Liu, Liang Li, Shuhui Wang, Zheng-Jun Zha, Zechao Li, Qi Tian, Qingming Huang
Second, most previous weakly supervised REG methods ignore the discriminative location and context of the referent, causing difficulties in distinguishing the target from other same-category objects.
1 code implementation • 13 Jun 2022 • Yanpeng Sun, Qiang Chen, Xiangyu He, Jian Wang, Haocheng Feng, Junyu Han, Errui Ding, Jian Cheng, Zechao Li, Jingdong Wang
In this paper, we rethink the paradigm and explore a new regime: {\em fine-tuning a small part of parameters in the backbone}.
Ranked #15 on
Few-Shot Semantic Segmentation
on COCO-20i (1-shot)
no code implementations • IEEE Transactions on Image Processing 2021 • Xinguang Xiang, YaJie Zhang, Lu Jin, Zechao Li, Jinhui Tang
Specifically, to localize diverse local regions, a sub-region localization module is developed to learn discriminative local features by locating the peaks of non-overlap sub-regions in the feature map.
no code implementations • 5 Nov 2021 • Yanpeng Sun, Zechao Li
The pixel-wise dense prediction tasks based on weakly supervisions currently use Class Attention Maps (CAM) to generate pseudo masks as ground-truth.
Weakly-Supervised Object Localization
Weakly supervised Semantic Segmentation
+1
1 code implementation • 20 Apr 2021 • Zechao Li, Yanpeng Sun, Jinhui Tang
Specifically, the Spatial Contextual Module (SCM) is leveraged to uncover the spatial contextual dependency between pixels by exploring the correlation between pixels and categories.
Ranked #75 on
Semantic Segmentation
on ADE20K val
no code implementations • 7 Aug 2020 • Chongyi Li, Huazhu Fu, Runmin Cong, Zechao Li, Qianqian Xu
We further demonstrate the advantages of the proposed method for improving the accuracy of retinal vessel segmentation.
1 code implementation • 6 Aug 2020 • Chuanyi Zhang, Yazhou Yao, Xiangbo Shu, Zechao Li, Zhenmin Tang, Qi Wu
To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition.
1 code implementation • ECCV 2020 • Xiaobin Hu, Wenqi Ren, John LaMaster, Xiaochun Cao, Xiaoming Li, Zechao Li, Bjoern Menze, Wei Liu
State-of-the-art face super-resolution methods employ deep convolutional neural networks to learn a mapping between low- and high- resolution facial patterns by exploring local appearance knowledge.
no code implementations • 9 Jan 2019 • Lu Jin, Zechao Li, Jinhui Tang
In this article, we propose a novel deep semantic multimodal hashing network (DSMHN) for scalable image-text and video-text retrieval.
no code implementations • CVPR 2018 • Runde Li, Jinshan Pan, Zechao Li, Jinhui Tang
In contrast, we solve this problem based on a conditional generative adversarial network (cGAN), where the clear image is estimated by an end-to-end trainable neural network.
no code implementations • CVPR 2018 • Jinshan Pan, Sifei Liu, Deqing Sun, Jiawei Zhang, Yang Liu, Jimmy Ren, Zechao Li, Jinhui Tang, Huchuan Lu, Yu-Wing Tai, Ming-Hsuan Yang
These problems usually involve the estimation of two components of the target signals: structures and details.
no code implementations • 7 May 2018 • Lu Jin, Xiangbo Shu, Kai Li, Zechao Li, Guo-Jun Qi, Jinhui Tang
However, most existing deep hashing methods directly learn the hash functions by encoding the global semantic information, while ignoring the local spatial information of images.
no code implementations • 12 Apr 2018 • Jinhui Tang, Xiangbo Shu, Zechao Li, Yu-Gang Jiang, Qi Tian
Recent approaches simultaneously explore visual, user and tag information to improve the performance of image retagging by constructing and exploring an image-tag-user graph.
no code implementations • CVPR 2017 • Longquan Dai, Mengke Yuan, Zechao Li, Xiaopeng Zhang, Jinhui Tang
In this paper we propose a hardware-efficient Guided Filter (HGF), which solves the efficiency problem of multichannel guided image filtering and yields competent results when applying it to multi-label problems with synthesized polynomial multichannel guidance.
no code implementations • 14 Jun 2017 • Yu-Gang Jiang, Zuxuan Wu, Jinhui Tang, Zechao Li, xiangyang xue, Shih-Fu Chang
More specifically, we utilize three Convolutional Neural Networks (CNNs) operating on appearance, motion and audio signals to extract their corresponding features.
no code implementations • 4 Jun 2017 • Xiangbo Shu, Jinhui Tang, Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan
Basically, for each age group, we learn an aging dictionary to reveal its aging characteristics (e. g., wrinkles), where the dictionary bases corresponding to the same index yet from two neighboring aging dictionaries form a particular aging pattern cross these two age groups, and a linear combination of all these patterns expresses a particular personalized aging process.
no code implementations • 3 Jun 2017 • Xiangbo Shu, Jinhui Tang, Guo-Jun Qi, Yan Song, Zechao Li, Liyan Zhang
To this end, we propose a novel Concurrence-Aware Long Short-Term Sub-Memories (Co-LSTSM) to model the long-term inter-related dynamics between two interacting people on the bounding boxes covering people.
Ranked #2 on
Human Interaction Recognition
on BIT
no code implementations • CVPR 2013 • Yang Liu, Jing Liu, Zechao Li, Jinhui Tang, Hanqing Lu
In this paper, we propose a novel Weakly-Supervised Dual Clustering (WSDC) approach for image semantic segmentation with image-level labels, i. e., collaboratively performing image segmentation and tag alignment with those regions.