1 code implementation • 9 Aug 2024 • Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang
ProxyCLIP leverages the spatial feature correspondence from VFMs as a form of proxy attention to augment CLIP, thereby inheriting the VFMs' robust local consistency and maintaining CLIP's exceptional zero-shot transfer capacity.
Ranked #1 on Unsupervised Semantic Segmentation with Language-image Pre-training on PASCAL Context-60
Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +2
no code implementations • 17 Jul 2024 • Mengcheng Lan, Chaofeng Chen, Yiping Ke, Xinjiang Wang, Litong Feng, Wayne Zhang
Despite the success of large-scale pretrained Vision-Language Models (VLMs) especially CLIP in various open-vocabulary tasks, their application to semantic segmentation remains challenging, producing noisy segmentation maps with mis-segmented regions.
Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +1
1 code implementation • 13 Jun 2024 • Yue Zhou, Litong Feng, Yiping Ke, Xue Jiang, Junchi Yan, Xue Yang, Wayne Zhang
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual grounding.
no code implementations • 12 Jun 2024 • Lixian Zhang, Yi Zhao, Runmin Dong, Jinxiao Zhang, Shuai Yuan, Shilei Cao, Mengxuan Chen, Juepeng Zheng, Weijia Li, Wei Liu, Wayne Zhang, Litong Feng, Haohuan Fu
A$^{2}$-MAE integrates an anchor-aware masking strategy and a geographic encoding module to comprehensively exploit the properties of RS images.
1 code implementation • 29 Mar 2024 • Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He
The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs.
1 code implementation • NeurIPS 2023 • Mengcheng Lan, Xinjiang Wang, Yiping Ke, Jiaxing Xu, Litong Feng, Wayne Zhang
Unsupervised semantic segmentation is a challenging task that segments images into semantic groups without manual annotation.
1 code implementation • ICCV 2023 • Yijiang Li, Xinjiang Wang, Lihe Yang, Litong Feng, Wayne Zhang, Ying Gao
Deep co-training has been introduced to semi-supervised segmentation and achieves impressive results, yet few studies have explored the working mechanism behind it.
1 code implementation • CVPR 2023 • Xinjiang Wang, Xingyi Yang, Shilong Zhang, Yijiang Li, Litong Feng, Shijie Fang, Chengqi Lyu, Kai Chen, Wayne Zhang
In this study, we dive deep into the inconsistency of pseudo targets in semi-supervised object detection (SSOD).
1 code implementation • CVPR 2023 • Lihe Yang, Lei Qi, Litong Feng, Wayne Zhang, Yinghuan Shi
In this work, we revisit the weak-to-strong consistency framework, popularized by FixMatch from semi-supervised classification, where the prediction of a weakly perturbed image serves as supervision for its strongly perturbed version.
Semi-supervised Change Detection Semi-supervised Medical Image Segmentation +1
2 code implementations • CVPR 2022 • Haoqi Wang, Zhizhong Li, Litong Feng, Wayne Zhang
Most of the existing Out-Of-Distribution (OOD) detection algorithms depend on single input source: the feature, the logit, or the softmax probability.
2 code implementations • ICCV 2021 • Jingkang Yang, Haoqi Wang, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang, Ziwei Liu
The proposed UDG can not only enrich the semantic knowledge of the model by exploiting unlabeled data in an unsupervised manner, but also distinguish ID/OOD samples to enhance ID classification and OOD detection tasks simultaneously.
Out-of-Distribution Detection Out of Distribution (OOD) Detection
no code implementations • 13 Aug 2021 • Xiaopeng Yan, Riquan Chen, Litong Feng, Jingkang Yang, Huabin Zheng, Wayne Zhang
In this paper, we propose to label only the most representative samples to expand the labeled set.
1 code implementation • 12 Oct 2020 • Jingkang Yang, Weirong Chen, Litong Feng, Xiaopeng Yan, Huabin Zheng, Wayne Zhang
VSGraph-LC starts from anchor selection referring to the semantic similarity between metadata and correct label concepts, and then propagates correct labels from anchors on a visual graph using graph neural network (GNN).
Ranked #9 on Image Classification on WebVision-1000
4 code implementations • ECCV 2020 • Jingkang Yang, Litong Feng, Weirong Chen, Xiaopeng Yan, Huabin Zheng, Ping Luo, Wayne Zhang
Therefore, a simple yet effective WSL framework is proposed.
Ranked #7 on Image Classification on WebVision-1000
2 code implementations • CVPR 2020 • Xinjiang Wang, Shilong Zhang, Zhuoran Yu, Litong Feng, Wayne Zhang
Inspired by this, a convolution across the pyramid level is proposed in this study, which is termed pyramid convolution and is a modified 3-D convolution.
Ranked #88 on Object Detection on COCO test-dev
no code implementations • 30 Jan 2020 • Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei zhang
This phenomenon is caused by the normalization effect of BN, which induces a non-trainable region in the parameter space and reduces the network capacity as a result.
no code implementations • 20 Sep 2019 • Zhe Huang, Weijiang Yu, Wayne Zhang, Litong Feng, Nong Xiao
Taking the residual result (the coarse de-rained result) between the rainy image sample (i. e. the input data) and the output of coarse stage (i. e. the learnt rain mask) as input, the fine stage continues to de-rain by removing the fine-grained rain streaks (e. g. light rain streaks and water mist) to get a rain-free and well-reconstructed output image via a unified contextual merging sub-network with dense blocks and a merging block.
1 code implementation • 2 Jan 2019 • Shitao Tang, Litong Feng, Wenqi Shao, Zhanghui Kuang, Wei zhang, Yimin Chen
ADL enlarges the distillation loss for hard-to-learn and hard-to-mimic samples and reduces distillation loss for the dominant easy samples, enabling distillation to work on the single-stage detector first time, even if the student and the teacher are identical.
no code implementations • 15 Aug 2018 • Zhaoyang Zhang, Zhanghui Kuang, Ping Luo, Litong Feng, Wei zhang
Secondly, TSD significantly reduces the computations to run video action recognition with compressed frames on the cloud, while maintaining high recognition accuracies.
4 code implementations • 13 Aug 2018 • Shitao Tang, Litong Feng, Zhangkui Kuang, Yimin Chen, Wei zhang
In order to train a high-performance shot transition detector, we contribute a new database ClipShots, which contains 128636 cut transitions and 38120 gradual transitions from 4039 online videos.
Ranked #3 on Camera shot boundary detection on ClipShots (using extra training data)