1 code implementation • 21 Nov 2024 • Lin Sun, Jiale Cao, Jin Xie, Xiaoheng Jiang, Yanwei Pang
The proposed CLIPer includes an early-layer fusion module and a fine-grained compensation module.
Open Vocabulary Semantic Segmentation Open-Vocabulary Semantic Segmentation +2
no code implementations • 7 Nov 2024 • Shehan Munasinghe, Hanan Gani, Wenqi Zhu, Jiale Cao, Eric Xing, Fahad Shahbaz Khan, Salman Khan
To enable fine-grained grounding, we curate a multimodal dataset featuring detailed visually-grounded conversations using a semiautomatic annotation pipeline, resulting in a diverse set of 38k video-QA triplets along with 83k objects and 671k masks.
1 code implementation • 5 Oct 2024 • Chao Qin, Jiale Cao, Huazhu Fu, Fahad Shahbaz Khan, Rao Muhammad Anwer
On 21 3D medical image segmentation tasks, our proposed DB-SAM achieves an absolute gain of 8. 8%, compared to a recent medical SAM adapter in the literature.
1 code implementation • 5 Sep 2024 • Lin Sun, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang
Leveraging the entropy-reduced self-attention module, our iSeg stably improves refined cross-attention map with iterative refinement.
no code implementations • 24 Jul 2024 • Jingren Liu, Zhong Ji, Yunlong Yu, Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li
This work provides a theoretical foundation for understanding and improving PEFT-CL models, offering insights into the interplay between feature representation, task orthogonality, and generalization, contributing to the development of more efficient continual learning systems.
1 code implementation • 7 Jun 2024 • Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan
To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations.
no code implementations • 15 Apr 2024 • Bonan Ding, Jin Xie, Jing Nie, Jiale Cao, Xuelong Li, Yanwei Pang
Therefore, an effective solution involves transforming monocular images into LiDAR-like representations and employing a LiDAR-based 3D object detector to predict the 3D coordinates of objects.
no code implementations • 11 Apr 2024 • Hefeng Wang, Jiale Cao, Jin Xie, Aiping Yang, Yanwei Pang
The explicit branch utilizes the ground-truth labels of corresponding images as text prompts to condition feature extraction of diffusion model.
no code implementations • 29 Mar 2024 • Zhongrui Yu, Haoran Wang, Jinze Yang, Hanzhang Wang, Zeke Xie, Yunfeng Cai, Jiale Cao, Zhong Ji, Mingming Sun
To tackle this problem, we propose a novel approach that enhances the capacity of 3DGS by leveraging prior from a Diffusion Model along with complementary multi-modal data.
1 code implementation • 19 Mar 2024 • Wenqi Zhu, Jiale Cao, Jin Xie, Shuangming Yang, Yanwei Pang
The experiments are performed on various video instance segmentation datasets, which demonstrate the effectiveness of our proposed method, especially for novel categories.
1 code implementation • CVPR 2024 • Bin Xie, Jiale Cao, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang
In this paper, we propose a simple encoder-decoder, named SED, for open-vocabulary semantic segmentation, which comprises a hierarchical encoder-based cost map generation and a gradual fusion decoder with category early rejection.
no code implementations • 22 Sep 2023 • Feng Yan, Xiaoheng Jiang, Yang Lu, Lisha Cui, Shupan Li, Jiale Cao, Mingliang Xu, DaCheng Tao
To this end, we develop a Global Context Aggregation Network (GCANet) for lightweight saliency detection of surface defects on the encoder-decoder structure.
no code implementations • 22 Sep 2023 • Xiaoheng Jiang, Kaiyi Guo, Yang Lu, Feng Yan, Hao liu, Jiale Cao, Mingliang Xu, DaCheng Tao
To address these issues, we propose a transformer network with multi-stage CNN (Convolutional Neural Network) feature injection for surface defect segmentation, which is a UNet-like structure named CINFormer.
1 code implementation • 9 Sep 2023 • Chao Qin, Jiale Cao, Huazhu Fu, Rao Muhammad Anwer, Fahad Shahbaz Khan
Existing video-based breast lesion detection approaches typically perform temporal feature aggregation of deep backbone features based on the self-attention operation.
1 code implementation • 6 Jun 2023 • Hefeng Wang, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Shahbaz Khan, Yanwei Pang
Our DFormer outperforms the recent diffusion-based panoptic segmentation method Pix2Seq-D with a gain of 3. 6% on MS COCO val2017 set.
no code implementations • 24 Apr 2023 • Hanqing Sun, Yanwei Pang, Jiale Cao, Jin Xie, Xuelong Li
In this paper, we explore the model design of Transformers in binocular 3D object detection, focusing particularly on extracting and encoding task-specific image correspondence information.
no code implementations • 21 Mar 2023 • Zhiqiang Dong, Jiale Cao, Rao Muhammad Anwer, Jin Xie, Fahad Khan, Yanwei Pang
Given a set of sparse and learnable proposals, LEAPS employs a dynamic person search head to directly perform person detection and corresponding re-id feature generation without non-maximum suppression post-processing.
1 code implementation • 9 Feb 2023 • Jiabei Wang, Yanwei Pang, Jiale Cao, Hanqing Sun, Zhuang Shao, Xuelong Li
We hope that our simple intra-image contrastive learning can provide more paradigms on weakly supervised person search.
1 code implementation • 8 Aug 2022 • Jean Lahoud, Jiale Cao, Fahad Shahbaz Khan, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, Ming-Hsuan Yang
The success of the transformer architecture in natural language processing has recently triggered attention in the computer vision field.
1 code implementation • CVPR 2022 • Jiale Cao, Yanwei Pang, Rao Muhammad Anwer, Hisham Cholakkal, Jin Xie, Mubarak Shah, Fahad Shahbaz Khan
We propose a novel one-step transformer-based person search framework, PSTR, that jointly performs person detection and re-identification (re-id) in a single architecture.
1 code implementation • 24 Mar 2022 • Omkar Thawakar, Sanath Narayan, Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Muhammad Haris Khan, Salman Khan, Michael Felsberg, Fahad Shahbaz Khan
When using the ResNet50 backbone, our MS-STS achieves a mask AP of 50. 1 %, outperforming the best reported results in literature by 2. 7 % and by 4. 8 % at higher overlap threshold of AP_75, while being comparable in model size and speed on Youtube-VIS 2019 val.
no code implementations • 28 Nov 2021 • Aqi Gao, Yanwei Pang, Jing Nie, Jiale Cao, Yishun Guo
The key in our ESGN is an efficient geometry-aware feature generation (EGFG) module.
no code implementations • 18 Jun 2021 • Aqi Gao, Jiale Cao, Yanwei Pang
Compared with the baseline RTS3D, our proposed method has 2. 57% improvement on AP3d almost without extra network parameters.
1 code implementation • CVPR 2021 • Jialian Wu, Jiale Cao, Liangchen Song, Yu Wang, Ming Yang, Junsong Yuan
Most online multi-object trackers perform object detection stand-alone in a neural net without any input from tracking.
Ranked #1 on Instance Segmentation on nuScenes
1 code implementation • 3 Dec 2020 • Tiancai Wang, Tong Yang, Jiale Cao, Xiangyu Zhang
Object detectors usually achieve promising results with the supervision of complete instance annotations.
1 code implementation • 18 Nov 2020 • Yanwei Pang, Jiale Cao, Yazhao Li, Jin Xie, Hanqing Sun, Jinfeng Gong
In addition, a new diverse pedestrian dataset is further built.
2 code implementations • 1 Oct 2020 • Jiale Cao, Yanwei Pang, Jin Xie, Fahad Shahbaz Khan, Ling Shao
In addition to single-spectral pedestrian detection, we also review multi-spectral pedestrian detection, which provides more robust features for illumination variance.
1 code implementation • ECCV 2020 • Jiale Cao, Rao Muhammad Anwer, Hisham Cholakkal, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao
In terms of real-time capabilities, SipMask outperforms YOLACT with an absolute gain of 3. 0% (mask AP) under similar settings, while operating at comparable speed on a Titan Xp.
Ranked #11 on Real-time Instance Segmentation on MSCOCO
1 code implementation • CVPR 2020 • Jiale Cao, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Yanwei Pang, Ling Shao
For precise localization, we introduce a dense local regression that predicts multiple dense box offsets for an object proposal.
Ranked #71 on Instance Segmentation on COCO test-dev
no code implementations • CVPR 2020 • Yazhao Li, Yanwei Pang, Jianbing Shen, Jiale Cao, Ling Shao
With this observation, we propose a new Neighbor Erasing and Transferring (NET) mechanism to reconfigure the pyramid features and explore scale-aware features.
1 code implementation • ICCV 2019 • Jiale Cao, Yanwei Pang, Jungong Han, Xuelong Li
To further solve the second problem, a hierarchical shot detector (HSD) is proposed, which stacks two ROC modules and one feature enhanced module.
Ranked #4 on Object Detection on PASCAL VOC 2007
no code implementations • CVPR 2019 • Jiale Cao, Yanwei Pang, Xuelong. Li
Experimental results on the VOC2007 and VOC2012 datasets demonstrate that the proposed TripleNet is able to improve both the detection and segmentation accuracies without adding extra computational costs.
Ranked #18 on Semantic Segmentation on PASCAL VOC 2012 test
no code implementations • 3 Apr 2018 • Jiale Cao, Yanwei Pang, Xuelong. Li
In this paper, we propose a multi-branch and high-level semantic network by gradually splitting a base network into multiple different branches.
no code implementations • 1 Mar 2016 • Jiale Cao, Yanwei Pang, Xuelong. Li
For example, CNN classifies these proposals by the full-connected layer features while proposal scores and the features in the inner-layers of CNN are ignored.
Ranked #25 on Pedestrian Detection on Caltech
no code implementations • CVPR 2016 • Jiale Cao, Yanwei Pang, Xuelong. Li
Finally, we propose to combine both non-neighboring and neighboring features for pedestrian detection.
Ranked #28 on Pedestrian Detection on Caltech
no code implementations • 23 Aug 2015 • Yanwei Pang, Jiale Cao, Xuelong. Li
Multistage particle windows (MPW), proposed by Gualdi et al., is an algorithm of fast and accurate object detection.
no code implementations • 18 Aug 2015 • Yanwei Pang, Jiale Cao, Xuelong. Li
iCascade searches the optimal number ri of weak classifiers of each stage i by directly minimizing the computation cost of the cascade.