no code implementations • 12 Nov 2023 • Yilin Zhao, Xinbin Yuan, ShangHua Gao, Zhijie Lin, Qibin Hou, Jiashi Feng, Daquan Zhou
For MoV, we utilize the text-to-speech (TTS) algorithms with a variety of pre-defined tones and select the most matching one based on the user-provided text description automatically.
1 code implementation • 20 Oct 2023 • Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ping Wang, Ming-Ming Cheng
A fundamental limitation of object detectors is that they suffer from "spatial bias", and in particular perform less satisfactorily when detecting objects near image borders.
1 code implementation • 18 Sep 2023 • Bowen Yin, Xuying Zhang, Zhongyu Li, Li Liu, Ming-Ming Cheng, Qibin Hou
We present DFormer, a novel RGB-D pretraining framework to learn transferable representations for RGB-D segmentation tasks.
Ranked #1 on
RGB-D Salient Object Detection
on DES
no code implementations • 8 Sep 2023 • Yupeng Zhou, Daquan Zhou, Zuo-Liang Zhu, Yaxing Wang, Qibin Hou, Jiashi Feng
In this work, we identify that a crucial factor leading to the text-image mismatch issue is the inadequate cross-modality relation learning between the prompt and the output image.
1 code implementation • 10 Aug 2023 • Yuming Chen, Xinbin Yuan, Ruiqi Wu, Jiabao Wang, Qibin Hou, Ming-Ming Cheng
We aim at providing the object detection community with an efficient and performant object detector, termed YOLO-MS.
1 code implementation • 20 Jun 2023 • Jiabao Wang, Yuming Chen, Zhaohui Zheng, Xiang Li, Ming-Ming Cheng, Qibin Hou
Such a distillation manner relieves the student's head from receiving contradictory supervision signals from the ground-truth annotations and the teacher's predictions, greatly improving the student's detection performance.
1 code implementation • 13 Jun 2023 • Xuying Zhang, Bowen Yin, Zheng Lin, Qibin Hou, Deng-Ping Fan, Ming-Ming Cheng
We consider the problem of referring camouflaged object detection (Ref-COD), a new task that aims to segment specified camouflaged objects based on a small set of referring images with salient target objects.
1 code implementation • 7 Jun 2023 • Boyuan Sun, YuQi Yang, Weifeng Yuan, Le Zhang, Ming-Ming Cheng, Qibin Hou
In this paper, we present a simple but performant semi-supervised semantic segmentation approach, termed CorrMatch.
no code implementations • 24 May 2023 • Cheng-Ze Lu, Xiaojie Jin, Qibin Hou, Jun Hao Liew, Ming-Ming Cheng, Jiashi Feng
The study reveals that: 1) MIM can be viewed as an effective method to improve the model capacity when the scale of the training data is relatively small; 2) Strong reconstruction targets can endow the models with increased capacities on downstream tasks; 3) MIM pre-training is data-agnostic under most scenarios, which means that the strategy of sampling pre-training data is non-critical.
1 code implementation • CVPR 2023 • Zhen Li, Zuo-Liang Zhu, Ling-Hao Han, Qibin Hou, Chun-Le Guo, Ming-Ming Cheng
It is based on two essential designs.
1 code implementation • 28 Mar 2023 • Senmao Li, Joost Van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, Jian Yang
A significant research effort is focused on exploiting the amazing capacities of pretrained diffusion models for the editing of images.
Ranked #4 on
Text-based Image Editing
on PIE-Bench
1 code implementation • ICCV 2023 • Yupeng Zhou, Zhen Li, Chun-Le Guo, Song Bai, Ming-Ming Cheng, Qibin Hou
Previous works have shown that increasing the window size for Transformer-based image super-resolution models (e. g., SwinIR) can significantly improve the model performance but the computation overhead is also considerable.
1 code implementation • ICCV 2023 • YuXuan Li, Qibin Hou, Zhaohui Zheng, Ming-Ming Cheng, Jian Yang, Xiang Li
To the best of our knowledge, this is the first time that large and selective kernel mechanisms have been explored in the field of remote sensing object detection.
Ranked #1 on
Semantic Segmentation
on UAVid
no code implementations • 6 Mar 2023 • Peng-Tao Jiang, YuQi Yang, Yang Cao, Qibin Hou, Ming-Ming Cheng, Chunhua Shen
Traffic scene parsing is one of the most important tasks to achieve intelligent cities.
no code implementations • 15 Jan 2023 • Cheng-Ze Lu, Xiaojie Jin, Zhicheng Huang, Qibin Hou, Ming-Ming Cheng, Jiashi Feng
Contrastive Masked Autoencoder (CMAE), as a new self-supervised framework, has shown its potential of learning expressive feature representations in visual image recognition.
1 code implementation • 14 Jan 2023 • Zhaohui Zheng, Yuming Chen, Qibin Hou, Xiang Li, Ming-Ming Cheng
In this paper, we study the spatial disequilibrium problem of modern object detectors and propose to quantify this ``spatial bias'' by measuring the detection performance over zones.
no code implementations • 14 Dec 2022 • Le Zhang, Qibin Hou, Yun Liu, Jia-Wang Bian, Xun Xu, Joey Tianyi Zhou, Ce Zhu
Ensemble learning serves as a straightforward way to improve the performance of almost any machine learning algorithm.
1 code implementation • 10 Dec 2022 • Bowen Yin, Xuying Zhang, Qibin Hou, Bo-Yuan Sun, Deng-Ping Fan, Luc van Gool
How to identify and segment camouflaged objects from the background is challenging.
1 code implementation • 22 Nov 2022 • Qibin Hou, Cheng-Ze Lu, Ming-Ming Cheng, Jiashi Feng
This paper does not attempt to design a state-of-the-art method for visual recognition but investigates a more efficient way to make use of convolutions to encode spatial features.
3 code implementations • 18 Sep 2022 • Meng-Hao Guo, Cheng-Ze Lu, Qibin Hou, ZhengNing Liu, Ming-Ming Cheng, Shi-Min Hu
Notably, SegNeXt outperforms EfficientNet-L2 w/ NAS-FPN and achieves 90. 6% mIoU on the Pascal VOC 2012 test leaderboard using only 1/10 parameters of it.
Ranked #1 on
Semantic Segmentation
on iSAID
1 code implementation • 27 Jul 2022 • Zhicheng Huang, Xiaojie Jin, Chengze Lu, Qibin Hou, Ming-Ming Cheng, Dongmei Fu, Xiaohui Shen, Jiashi Feng
The target encoder, fed with the full images, enhances the feature discriminability via contrastive learning with its online counterpart.
1 code implementation • 12 Apr 2022 • Zhaohui Zheng, Rongguang Ye, Qibin Hou, Dongwei Ren, Ping Wang, WangMeng Zuo, Ming-Ming Cheng
Combining these two new components, for the first time, we show that logit mimicking can outperform feature imitation and the absence of localization distillation is a critical reason for why logit mimicking underperforms for years.
1 code implementation • CVPR 2022 • Peng-Tao Jiang, YuQi Yang, Qibin Hou, Yunchao Wei
Our framework conducts the global network to learn the captured rich object detail knowledge from a global view and thereby produces high-quality attention maps that can be directly used as pseudo annotations for semantic segmentation networks.
Ranked #11 on
Weakly-Supervised Semantic Segmentation
on COCO 2014 val
Transfer Learning
Weakly supervised Semantic Segmentation
+1
7 code implementations • 24 Jun 2021 • Li Yuan, Qibin Hou, Zihang Jiang, Jiashi Feng, Shuicheng Yan
Though recently the prevailing vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification, their performance is still inferior to that of the latest SOTA CNNs if no extra data are provided.
Ranked #1 on
Domain Generalization
on VizWiz-Classification
4 code implementations • 23 Jun 2021 • Qibin Hou, Zihang Jiang, Li Yuan, Ming-Ming Cheng, Shuicheng Yan, Jiashi Feng
By realizing the importance of the positional information carried by 2D feature representations, unlike recent MLP-like models that encode the spatial information along the flattened spatial dimensions, Vision Permutator separately encodes the feature representations along the height and width dimensions with linear projections.
1 code implementation • Findings (ACL) 2021 • Weihao Yu, Zihang Jiang, Fei Chen, Qibin Hou, Jiashi Feng
In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order.
3 code implementations • IEEE 2021 • Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming-Ming Cheng, Yunchao Wei
To evaluate the quality of the class activation maps produced by LayerCAM, we apply them to weakly-supervised object localization and semantic segmentation.
1 code implementation • 7 Jun 2021 • Daquan Zhou, Yujun Shi, Bingyi Kang, Weihao Yu, Zihang Jiang, Yuan Li, Xiaojie Jin, Qibin Hou, Jiashi Feng
Vision Transformers (ViTs) have shown competitive accuracy in image classification tasks compared with CNNs.
Ranked #169 on
Image Classification
on ImageNet
6 code implementations • NeurIPS 2021 • Zihang Jiang, Qibin Hou, Li Yuan, Daquan Zhou, Yujun Shi, Xiaojie Jin, Anran Wang, Jiashi Feng
In this paper, we present token labeling -- a new training objective for training high-performance vision transformers (ViTs).
Ranked #81 on
Semantic Segmentation
on ADE20K
1 code implementation • 24 Mar 2021 • Yang Cao, Zhengqiang Zhang, Enze Xie, Qibin Hou, Kai Zhao, Xiangui Luo, Jian Tuo
However, these methods usually encounter boundary-related imbalance problem, leading to limited generation capability.
5 code implementations • 22 Mar 2021 • Daquan Zhou, Bingyi Kang, Xiaojie Jin, Linjie Yang, Xiaochen Lian, Zihang Jiang, Qibin Hou, Jiashi Feng
In this paper, we show that, unlike convolution neural networks (CNNs)that can be improved by stacking more convolutional layers, the performance of ViTs saturate fast when scaled to be deeper.
Ranked #409 on
Image Classification
on ImageNet
1 code implementation • ICCV 2021 • Daquan Zhou, Xiaojie Jin, Xiaochen Lian, Linjie Yang, Yujing Xue, Qibin Hou, Jiashi Feng
Current neural architecture search (NAS) algorithms still require expert knowledge and effort to design a search space for network construction.
2 code implementations • CVPR 2021 • Qibin Hou, Daquan Zhou, Jiashi Feng
Recent studies on mobile network design have demonstrated the remarkable effectiveness of channel attention (e. g., the Squeeze-and-Excitation attention) for lifting model performance, but they generally neglect the positional information, which is important for generating spatially selective attention maps.
2 code implementations • CVPR 2022 • Zhaohui Zheng, Rongguang Ye, Ping Wang, Dongwei Ren, WangMeng Zuo, Qibin Hou, Ming-Ming Cheng
Previous KD methods for object detection mostly focus on imitating deep features within the imitation regions instead of mimicking classification logit due to its inefficiency in distilling localization information and trivial improvement.
2 code implementations • 25 Nov 2020 • Chang-Bin Zhang, Peng-Tao Jiang, Qibin Hou, Yunchao Wei, Qi Han, Zhen Li, Ming-Ming Cheng
Experiments demonstrate that based on the same classification models, the proposed approach can effectively improve the classification performance on CIFAR-100, ImageNet, and fine-grained datasets.
5 code implementations • 6 Oct 2020 • Diganta Misra, Trikay Nalamada, Ajay Uppili Arasanipalai, Qibin Hou
In this paper, we investigate light-weight but effective attention mechanisms and present triplet attention, a novel method for computing attention weights by capturing cross-dimension interaction using a three-branch structure.
4 code implementations • ECCV 2020 • Zhou Daquan, Qibin Hou, Yunpeng Chen, Jiashi Feng, Shuicheng Yan
In this paper, we rethink the necessity of such design changes and find it may bring risks of information loss and gradient confusion.
no code implementations • 14 Jun 2020 • Kuangqi Zhou, Qibin Hou, Zun Li, Jiashi Feng
In this paper, we propose a novel multi-miner framework to perform a region mining process that adapts to diverse object sizes and is thus able to mine more integral and finer object regions.
no code implementations • 18 Apr 2020 • Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng
To evaluate the performance of our proposed network on these tasks, we conduct exhaustive experiments on multiple representative datasets.
2 code implementations • CVPR 2020 • Qibin Hou, Li Zhang, Ming-Ming Cheng, Jiashi Feng
Spatial pooling has been proven highly effective in capturing long-range contextual information for pixel-wise prediction tasks, such as scene parsing.
Ranked #28 on
Semantic Segmentation
on Cityscapes test
no code implementations • 30 Mar 2020 • Dapeng Hu, Jian Liang, Qibin Hou, Hanshu Yan, Yunpeng Chen, Shuicheng Yan, Jiashi Feng
To successfully align the multi-modal data structures across domains, the following works exploit discriminative information in the adversarial training process, e. g., using multiple class-wise discriminators and introducing conditional information in input or output of the domain discriminator.
no code implementations • 25 Feb 2020 • Zun Li, Congyan Lang, Junhao Liew, Qibin Hou, Yidong Li, Jiashi Feng
Feature pyramid network (FPN) based models, which fuse the semantics and salient details in a progressive manner, have been proven highly effective in salient object detection.
no code implementations • 25 Sep 2019 • Dapeng Hu, Jian Liang*, Qibin Hou, Hanshu Yan, Jiashi Feng
Previous adversarial learning methods condition domain alignment only on pseudo labels, but noisy and inaccurate pseudo labels may perturb the multi-class distribution embedded in probabilistic predictions, hence bringing insufficient alleviation to the latent mismatch problem.
2 code implementations • 15 Jul 2019 • Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, Yun Liu, Zhao Zhang, Qibin Hou, Menglong Zhu, Ming-Ming Cheng
The use of RGB-D information for salient object detection has been extensively explored in recent years.
Ranked #4 on
RGB-D Salient Object Detection
on RGBD135
no code implementations • ICLR 2020 • Daquan Zhou, Xiaojie Jin, Qibin Hou, Kaixin Wang, Jianchao Yang, Jiashi Feng
The recent WSNet [1] is a new model compression method through sampling filterweights from a compact set and has demonstrated to be effective for 1D convolutionneural networks (CNNs).
5 code implementations • CVPR 2019 • Jiang-Jiang Liu, Qibin Hou, Ming-Ming Cheng, Jiashi Feng, Jianmin Jiang
We further design a feature aggregation module (FAM) to make the coarse-level semantic information well fused with the fine-level features from the top-down pathway.
Ranked #1 on
RGB Salient Object Detection
on SOD
no code implementations • NeurIPS 2018 • Qibin Hou, Peng-Tao Jiang, Yunchao Wei, Ming-Ming Cheng
To test the quality of the generated attention maps, we employ the mined object regions as heuristic cues for learning semantic segmentation models.
no code implementations • ECCV 2018 • Ruochen Fan, Qibin Hou, Ming-Ming Cheng, Gang Yu, Ralph R. Martin, Shi-Min Hu
We also combine our method with Mask R-CNN for instance segmentation, and demonstrated for the first time the ability of weakly supervised instance segmentation using only keyword annotations.
Ranked #4 on
Image-level Supervised Instance Segmentation
on COCO test-dev
(using extra training data)
no code implementations • 27 Mar 2018 • Qibin Hou, Jiang-Jiang Liu, Ming-Ming Cheng, Ali Borji, Philip H. S. Torr
Although these tasks are inherently very different, we show that our unified approach performs very well on all of them and works far better than current single-purpose state-of-the-art methods.
no code implementations • 27 Mar 2018 • Qibin Hou, Ming-Ming Cheng, Jiang-Jiang Liu, Philip H. S. Torr
In this paper, we improve semantic segmentation by automatically learning from Flickr images associated with a particular keyword, without relying on any explicit user annotations, thus substantially alleviating the dependence on accurate annotations when compared to previous weakly supervised methods.
no code implementations • ECCV 2018 • Deng-Ping Fan, Ming-Ming Cheng, Jiang-Jiang Liu, Shang-Hua Gao, Qibin Hou, Ali Borji
Our analysis identifies a serious design bias of existing SOD datasets which assumes that each image contains at least one clearly outstanding salient object in low clutter.
1 code implementation • CVPR 2019 • Ruochen Fan, Ming-Ming Cheng, Qibin Hou, Tai-Jiang Mu, Jingdong Wang, Shi-Min Hu
Taking into account the category-independent property of each target, we design a single stage salient instance segmentation framework, with a novel segmentation branch.
no code implementations • 6 Dec 2016 • Jia-Xing Zhao, Ren Bo, Qibin Hou, Ming-Ming Cheng, Paul L. Rosin
It also has drawbacks on convergence rate as a result of both the fixed search region and separately doing the assignment step and the update step.
2 code implementations • CVPR 2017 • Qibin Hou, Ming-Ming Cheng, Xiao-Wei Hu, Ali Borji, Zhuowen Tu, Philip Torr
Recent progress on saliency detection is substantial, benefiting mostly from the explosive development of Convolutional Neural Networks (CNNs).
Ranked #4 on
RGB Salient Object Detection
on SBU
no code implementations • 18 Nov 2014 • Ali Borji, Ming-Ming Cheng, Qibin Hou, Huaizu Jiang, Jia Li
Detecting and segmenting salient objects from natural scenes, often referred to as salient object detection, has attracted great interest in computer vision.